Show simple item record

dc.contributor.author Enot, David P.
dc.contributor.author Beckmann, Manfred
dc.contributor.author Overy, David P.
dc.contributor.author Draper, John
dc.date.accessioned 2008-12-11T12:34:00Z
dc.date.available 2008-12-11T12:34:00Z
dc.date.issued 2006-10-03
dc.identifier.citation Enot , D P , Beckmann , M , Overy , D P & Draper , J 2006 , ' Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals ' Proceedings of the National Academy of Sciences of the United States of America , vol 103 , no. 40 , pp. 14865-14870 . , 10.1073/pnas.0605152103 en
dc.identifier.issn 0027-8424
dc.identifier.other PURE: 92054
dc.identifier.other dspace: 2160/1539
dc.identifier.uri http://hdl.handle.net/2160/1539
dc.identifier.uri http://intl.pnas.org/cgi/content/abstract/103/40/14865 en
dc.description Enot, D. P., Beckmann, M., Overy, D., Draper, J. (2006). Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals. Proceedings of the National Academy of Sciences of the USA, 103(40), 14865-14870. Sponsorship: BBSRC RAE2008 en
dc.description.abstract Powerful algorithms are required to deal with the dimensionality of metabolomics data. Although many achieve high classification accuracy, the models they generate have limited value unless it can be demonstrated that they are reproducible and statistically relevant to the biological problem under investigation. Random forest (RF) generates models, without any requirement for dimensionality reduction or feature selection, in which individual variables are ranked for significance and displayed in an explicit manner. In metabolome fingerprinting by mass spectrometry, each metabolite can be represented by signals at several m/z. Exploiting a prior understanding of expected biochemical differences between sample classes, we aimed to develop meaningful metrics relevant to the significance both of the overall RF model and individual, potentially explanatory, signals. Pair-wise comparison of related plant genotypes with strong phenotypic differences demonstrated that robust models are not only reproducible but also logically structured, highlighting correlated m/z derived from just a small number of explanatory metabolites reflecting the biological differences between sample classes. RF models were also generated by using groupings of samples known to be increasingly phenotypically similar. Although classification accuracy was often reasonable, we demonstrated reproducibly in both Arabidopsis and potato a performance threshold based on margin statistics beyond which such models showed little structure indicative of either generalizibility or further biological interpretability. In a multiclass problem using 25 Arabidopsis genotypes, despite the complicating effects of ecotype background and secondary metabolome perturbations common to several mutations, the ranking of metabolome signals by RF provided scope for deeper interpretability. en
dc.format.extent 6 en
dc.language.iso eng
dc.relation.ispartof Proceedings of the National Academy of Sciences of the United States of America en
dc.subject mass spectral fingerprinting en
dc.subject phenotyping en
dc.subject random forest data analysis en
dc.title Predicting interpretability of metabolome models based on behavior, putative identity, and biological relevance of explanatory signals en
dc.type Text en
dc.type.publicationtype Article (Journal) en
dc.identifier.doi http://dx.doi.org/10.1073/pnas.0605152103
dc.contributor.institution Institute of Biological, Environmental and Rural Sciences en
dc.description.status Peer reviewed en


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search Cadair


Advanced Search

Browse

Statistics