A technique I have seen used for statistical confidence testing of non-Gaussian distributions is to generate 1000 random examples of the distribution. If the you want to be 95% confident that answer to be checked comes from the same distribution then it should be "like" 950 of the 1000 examples.
Eg if the distribution is reasonably well behaved then if the answer to be checked lies outside the range of the 25th to 975th example we can say we confidently reject the null hypothesis and say our answer is not from the distribution used to generate the 1000 examples. We do not need Z-scores, t-tests etc.
This non-parametric test should be ok with any distribution. We are effectively burning CPU cyles rather than spending brain cycles on devising and validating a statistical technique specifically for our new distribution.
About the GPEMjournal blog
This is the editor's blog for the journal Genetic Programming and Evolvable Machines. The official web site for the journal, maintained by the publisher (Springer) is here. The GPEMjournal blog is authored and maintained by Lee Spector.
Wednesday, August 31, 2011
Wednesday, August 17, 2011
Computational Intelligence Using Genetic Programming - GPTP'11
This is a repost of my piece on GPTP'11 from Epistasis Blog.
I just returned from the IXth Genetic Programming Theory and Practice Workshop held by the Center for the Study of Complex Systems at the University of Michigan. This is an invitation only workshop that brings together theorists and practitioners interested in the development and application of computer systems that can solve complex problems by developing their own programs (i.e. automatic programming). This group focuses on the use of genetic programming or GP to discover useful computer programs using the principles of evolution by natural selection. The proceedings from this workshop are published each year in a book that can be found on Amazon. The proceedings from this year will be published in late 2011 or early 2012.
The real value of this workshop is the large amount of time dedicted to open-ended discussion about how solve complex problems in medicine, industry, finance, etc. My own motivation for working with GP is to teach the computer how to solve a complex human genetics problem as I would. I do not believe that naive computer programs or analysis strategies such as those used in the agnostics genome-wide association study (GWAS) paradigm will be successful in addressing the complexity of the genotype-phenotype relationship. We, as human analysis engines, don't ignore the pathobiology of disease when we look at data. Why should we instruct the computer to do the same? Given infinite time, each of us would tinker and try new and different things with the data until we found a good answer that made biological sense. We would use our knoweldge of biochemistry, genomics, molecular biology, pathology and physiology to both frame the analysis and interpret the results. Our series of papers published as part of GPTP since 2006 have focused on adaptive computer programs that harness this kind of biological and biomedical knowledge to explore the space of computer programs that can build models of genetic architecture.
One of the more interesting and extended discussions at GPTP this year was about novelty-seeking. Ken Stanley gave a great talk about rewarding computer programs that explore new and different solutions to a problem (read more). His Picbreeder program is a nice example of novelty search in the sense that you can discover and develop interesting pictures without a clear initial objective in mind (e.g. evolve a picture of a car). An analogy in human genetics would be to reward computer program that generate genetic models of disease by exploring new biochemical pathways. I am working on approaches to try this within our own genetic analysis system. I like Ken's quote: "To achieve your highest goals, you must be willing to abandon them."
It is very clear that GP has been used to solve problems that humans or other computer programs haven't been able to. For example, Moshe Sipper has developed computer game players that rival human players (read more). Some of the participants (e.g. Michael Korns) even invest and make money using GP. This is a powerful way to do automatic programming and should be part of the broader toolbox of any complex problem-solver. I would be happy to send you a pre-print of our current GPTP paper.
I just returned from the IXth Genetic Programming Theory and Practice Workshop held by the Center for the Study of Complex Systems at the University of Michigan. This is an invitation only workshop that brings together theorists and practitioners interested in the development and application of computer systems that can solve complex problems by developing their own programs (i.e. automatic programming). This group focuses on the use of genetic programming or GP to discover useful computer programs using the principles of evolution by natural selection. The proceedings from this workshop are published each year in a book that can be found on Amazon. The proceedings from this year will be published in late 2011 or early 2012.
The real value of this workshop is the large amount of time dedicted to open-ended discussion about how solve complex problems in medicine, industry, finance, etc. My own motivation for working with GP is to teach the computer how to solve a complex human genetics problem as I would. I do not believe that naive computer programs or analysis strategies such as those used in the agnostics genome-wide association study (GWAS) paradigm will be successful in addressing the complexity of the genotype-phenotype relationship. We, as human analysis engines, don't ignore the pathobiology of disease when we look at data. Why should we instruct the computer to do the same? Given infinite time, each of us would tinker and try new and different things with the data until we found a good answer that made biological sense. We would use our knoweldge of biochemistry, genomics, molecular biology, pathology and physiology to both frame the analysis and interpret the results. Our series of papers published as part of GPTP since 2006 have focused on adaptive computer programs that harness this kind of biological and biomedical knowledge to explore the space of computer programs that can build models of genetic architecture.
One of the more interesting and extended discussions at GPTP this year was about novelty-seeking. Ken Stanley gave a great talk about rewarding computer programs that explore new and different solutions to a problem (read more). His Picbreeder program is a nice example of novelty search in the sense that you can discover and develop interesting pictures without a clear initial objective in mind (e.g. evolve a picture of a car). An analogy in human genetics would be to reward computer program that generate genetic models of disease by exploring new biochemical pathways. I am working on approaches to try this within our own genetic analysis system. I like Ken's quote: "To achieve your highest goals, you must be willing to abandon them."
It is very clear that GP has been used to solve problems that humans or other computer programs haven't been able to. For example, Moshe Sipper has developed computer game players that rival human players (read more). Some of the participants (e.g. Michael Korns) even invest and make money using GP. This is a powerful way to do automatic programming and should be part of the broader toolbox of any complex problem-solver. I would be happy to send you a pre-print of our current GPTP paper.
Tuesday, August 16, 2011
EC presence at biology meetings
I've just come back from a round of biology meetings. Few biologists know that evolution in silica exists. They may know that GA is "something that GARLI does", but that's about it (GARLI is a GA-based phylogeny estimation package). Every time I speak of EC, it is very well received, with surprise and wonder.
It would be good if we could publicize at evolutionary biology meetings more. At a minimum, Springer reps should bring GPEM when they attend evolution meetings.
Monday, August 8, 2011
Non standard terminology
I came across two non standard uses of evolutionary computing jargon at GECCO. EC, like the rest of technical literature, is full of jargon. Jargon can be helpful where people agree on its meaning but confuses when it is misused. I have posted a link to the online glossary from "Genetic programming and data structures" on the GP mailing list. Hans-Georg Beyer has also defined evolutionary algorithms terminology.
Thursday, August 4, 2011
GECCO 2011 bibtex GP bibliography
The latest release of the GP biblio contains more than one hundred entries from last month's GECCO conference in Dublin.
Bibtex files for the proceedings and companion are also online or searchable via CCSB
Bibtex files for the proceedings and companion are also online or searchable via CCSB
Wiki GP Bibliography
Adrian Carballal has created a wiki which allows you to maintain the GP bibliography's links to your homepage.
25 July 2011, University College, London, Genetic Programming for Software Engineering
The 14th CREST open workshop proved to be so popular that the free registrations were closed some weeks before hand.
Stephanie Forrest gave the keynote on evolving fixes to software for which she won the Humie.
Two other international speakers were Federica Sarro, who talked on estimating time to produce software using GP and Wasif Afzal, who reviewed GP for prediction (Slides, Video).
David White talked about new work on optimising server farms, JVM in cloud computing systems (Slides, Video). I talked about evolving a CUDA kernel for gzip running on a GeForce 295 GTX GPU (paper)
Stephanie Forrest gave the keynote on evolving fixes to software for which she won the Humie.
Two other international speakers were Federica Sarro, who talked on estimating time to produce software using GP and Wasif Afzal, who reviewed GP for prediction (Slides, Video).
David White talked about new work on optimising server farms, JVM in cloud computing systems (Slides, Video). I talked about evolving a CUDA kernel for gzip running on a GeForce 295 GTX GPU (paper)
Subscribe to:
Posts (Atom)