Aberystwyth Cadair Online Research Repository WRN Partners / Partneriaid RhCC
???jsp.layout.header-default.alt???
  •  
  • Home
  • Departments
  • Issue Date
  • Author
  • Title

 
Please use this identifier to cite or link to this item: http://hdl.handle.net/2160/3097

Title: 
Mining parasite data using genetic programming
Authors: Barrett, John
Kostadinova, A.
Raga, J. A.
Issue Date: 2005
Publisher: 
Elsevier
Citation: 
Barrett, J., Kostadinova, A., Raga, J.A. (2005). Mining parasite data using genetic programming. Trends in Parasitology, 21, (5), 207-209
Referenced By: 
Abstract: 
Genetic programming is a technique that can be used to tackle the hugely demanding data-processing problems encountered in the natural sciences. Application of genetic programming to a problem using parasites as biological tags demonstrates its potential for developing explanatory models using data that are both complex and noisy. In many areas of biology, the ability to collect data outstrips the ability to analyse it. Techniques are needed to mine large datasets and extract biologically meaningful relationships. Genetic programming (GP) is a stochastic optimization approach that helps to discover comprehensible rules for data mining. It is one of a group of supervised, evolutionary programming techniques that uses darwinian concepts to generate and optimize predictive mathematical models. This is done by mimicking ‘natural selection’ using ‘populations’ of mathematical models. Initially, a population of n models (short computer programmes) is generated, each model representing a different, random combination of variables, constants and mathematical functions. The fitness of each model is determined (in terms of how well it solves the problem). The ‘best’ models are then selected for ‘breeding’ to produce the next generation of ‘fitter’ models, and so on until a model is evolved that solves the problem with the required degree of accuracy or until a specified stopping criterion is reached. During breeding, different parts of the models are recombined, and the mathematical functions and variables can be changed: the equivalent of crossover and mutation. Because GP is a randomized algorithm, it is not deterministic, and each new run with a dataset evolves an independent model. Therefore, several alternative solutions to a problem can be evolved. For complex problems for which there is no single answer, each run can result in a different best model, and a validation process must then be devised to select the most appropriate one.
URI: 
Appears in Collections:IBERS Research papers

Files in This Item:

There are no files associated with this item.

All items in CADAIR are protected by copyright, with all rights reserved.
No item in CADAIR may be reproduced for commercial purposes.
For other possible restrictions on use please refer to the publisher's URL
where this is made available, or to notes contained in the item itself.
If you believe that any material held on CADAIR infringes copyright, please contact abuse@aber.ac.uk
providing details and we will remove the work from the repository and investigate your claim.

 
Feedback     Copyright © 2009 Aberystwyth University