Show simple item record

dc.contributor.author Karwath, Andreas
dc.contributor.author King, Ross Donald
dc.date.accessioned 2006-08-24T13:57:26Z
dc.date.available 2006-08-24T13:57:26Z
dc.date.issued 2002-04-23
dc.identifier.citation Karwath , A & King , R D 2002 , ' Homology Induction: the use of machine learning to improve sequence similarity searches ' BMC Bioinformatics , vol 3 , no. 11 . , 10.1186/1471-2105-3-11 en
dc.identifier.issn 1471-2105
dc.identifier.other PURE: 70889
dc.identifier.other dspace: 2160/258
dc.identifier.uri http://hdl.handle.net/2160/258
dc.identifier.uri http://www.biomedcentral.com/1471-2105/3/11 en
dc.description Karwath, A. King, R. Homology induction: the use of machine learning to improve sequence similarity searches. BMC Bioinformatics. 23rd April 2002. 3:11 Additional File Describes the title organims species declaration in one string [http://www.biomedcentral.com/content/supplementary/1471- 2105-3-11-S1.doc] Sponsorship: Andreas Karwath and Ross D. King were supported by the EPSRC grant GR/L62849. en
dc.description.abstract Background: The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify ~50% of homologies (with a false positive rate set at 1/ 1000). Results: We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to bootstrap from standard sequence similarity search methods. First a standard method is run, then HI learns rules which are true for sequences of high similarity to the target(assumed homologues) and not true for general sequences, these rules are then used to discriminate sequences in the twilight zone. To learn the rules HI describes the sequences in a novel way based on a bioinformatic knowledge base, and the machine learning method of inductive logic programming. To evaluate HI we used the PDB40D benchmark which lists sequences of known homology but low sequence similarity. We compared the HI methodoly with PSI-BLAST alone and found HI performed significantly better. In addition, Receiver Operating Characteristic (ROC) curve analysis showed that these improvements were robust for all reasonable error costs. The predictive homology rules learnt by HI by can be interpreted biologically to provide insight into conserved features of homologous protein families. Conclusions: HI is a new technique for the detection of remote protein homolgy – a central bioinformatic problem. HI with PSI-BLAST is shown to outperform PSI-BLAST for all error costs. It is expect that similar improvements would be obtained using HI with any sequence similarity method. en
dc.language.iso eng
dc.relation.ispartof BMC Bioinformatics en
dc.title Homology Induction: the use of machine learning to improve sequence similarity searches en
dc.type Text en
dc.type.publicationtype Article (Journal) en
dc.identifier.doi http://dx.doi.org/10.1186/1471-2105-3-11
dc.contributor.institution Department of Computer Science en
dc.contributor.institution Bioinformatics and Computational Biology Group en
dc.description.status Peer reviewed en


Files in this item

Aside from theses and in the absence of a specific licence document on an item page, all works in Cadair are accessible under the CC BY-NC-ND Licence. AU theses and dissertations held on Cadair are made available for the purposes of private study and non-commercial research and brief extracts may be reproduced under fair dealing for the purpose of criticism or review. If you have any queries in relation to the re-use of material on Cadair, contact is@aber.ac.uk.

This item appears in the following Collection(s)

Show simple item record

Search Cadair


Advanced Search

Browse

Statistics