[BioC] Machine learning, cross validation and gene selection

Wed Sep 1 16:55:29 CEST 2010

Hello,

I am getting a bit confused about gene selection and machine learning
and I was wondering if you could help me out.  I have a dataset that is
classified into two groups and my aim is to get a small number of genes
(10-20) in a gene signature that I will in theory be able to apply to
over datasets to optimal classify the samples.  As I do not have a test
and training set I am using Leave-one-out cross-validation to help
determine the robustness.  I have read that one should perform gene
selection for each split of the samples i.e.

1) Select one group as the test set
2) On the remainder select genes
3) Apply machine learning algorithm
4) Test whether the test set is correctly classified
5) Go to one

If you do this, you might get different genes each time, so how do you
get your "final" optimal gene classifier?

Many thanks

Dan

-- 
**************************************************************
Daniel Brewer, Ph.D.

Institute of Cancer Research
Molecular Carcinogenesis
Email: daniel.brewer at icr.ac.uk
**************************************************************

The Institute of Cancer Research: Royal Cancer Hospital, a charitable Company Limited by Guarantee, Registered in England under Company No. 534147 with its Registered Office at 123 Old Brompton Road, London SW7 3RP.

This e-mail message is confidential and for use by the a...{{dropped:2}}