Show simple item record

dc.contributor.advisor King, Ross Donald
dc.contributor.advisor Clare, Amanda Riley, Michael 2009-04-15T09:29:58Z 2009-04-15T09:29:58Z 2009-03
dc.description.abstract This thesis documents the investigation into the acquisition of knowledge from biological data using computational methods for the discovery of significantly frequent patterns in gene location and phylogeny. Beginning with an initial statistical analysis of distribution of gene locations in the flowering plant Arabidopsis thaliana, we discover unexplained elements of order. The second area of this research looks into frequent patterns in the single dimensional linear structure of the physical locations of genes on the genome of Saccharomyces cerevisiae. This is an area of epigenetics which has, hitherto, attracted little attention. The frequent patterns are patterns of structure represented in Datalog, suitable for analyses using the logic programming methodology Prolog. This is used to find patterns in gene location with respect to various gene attributes such as molecular function and the distance between genes. Here we find significant frequent patterns in neighbouring pairs of genes. We also discover very significant patterns in the molecular function of genes separated by distances of between 5,000 and 20,000 base pairs. However, in complete contrast to the latter result, we find that the distribution of genes of molecular function within a local region of ±20, 000 base pairs is locationally independent. In the second part of this research we look for significantly frequent patterns of phylogenetic subtrees in a broad database of phylogenetic trees. Here we investigate the use of two types of frequent phylogenetic structures. Firstly, phylogenetic pairs are used to determine relationships between organisms. Secondly, phylogenetic triple structures are used to represent subtrees. Frequent subtree mining is then used to establish phylogenetic relationships with a high confidence between a small set of organisms. This exercise was invaluable to enable these procedures to be extended in future to encompass much larger sets of organisms. This research has revealed effective methods for the analysis of, and has discovered patterns of order in the locations of genes within genomes. Research into phylogenetic tree generation based on protein structure has discovered the requirements for an effective method to extract elements of phylogenetic information from a phylogenetic database and reconstruct a single consensus tree from that information. In this way it should be possible to produce a species tree of life with high degree of confidence and resolution. en
dc.language.iso en en
dc.publisher Aberystwyth University en
dc.subject biological data en
dc.subject statistical analysis en
dc.title Significant Pattern Discovery in Gene Location and Phylogeny en
dc.type Text en
dc.publisher.department Computer Science en
dc.type.qualificationlevel doctoral en
dc.type.qualificationname PhD en
dc.type.publicationtype thesis or dissertation en

Files in this item

Aside from theses and in the absence of a specific licence document on an item page, all works in Cadair are accessible under the CC BY-NC-ND Licence. AU theses and dissertations held on Cadair are made available for the purposes of private study and non-commercial research and brief extracts may be reproduced under fair dealing for the purpose of criticism or review. If you have any queries in relation to the re-use of material on Cadair, contact

This item appears in the following Collection(s)

Show simple item record

Search Cadair

Advanced Search