[BioC] Best options for cross validation machine learning

Kasper Daniel Hansen khansen at stat.berkeley.edu
Tue Jan 19 20:43:06 CET 2010

On Jan 19, 2010, at 11:11 AM, Daniel Brewer wrote:

> 1) Pick a group of genes that best predict which group a sample belongs to.
> 2) Determine how stable these prediction sets are through some sort of
> cross-validation (I would prefer not to divide my set into a training
> and test set for stage one)

If you don't do this (the statement in ()) you will most likely get crap.  Note the astounding amount of papers in the literature that have attempted to do this.  And note that these papers never gets replicated, most likely because the statistical analysis is overly optimistic.

The track record for being able to do this is extremely bad, despite the number of papers claiming that their signature method is like 99% accurate.


More information about the Bioconductor mailing list