[BioC] Best options for cross validation machine learning

Lavinia Gordon lavinia.gordon at mcri.edu.au
Thu Jan 21 00:50:27 CET 2010

     Message: 9
     Date: Tue, 19 Jan 2010 16:11:14 +0000
     From: Daniel Brewer <daniel.brewer at icr.ac.uk>
     To: Bioconductor mailing list <bioconductor at stat.math.ethz.ch>
     Subject: [BioC] Best options for cross validation machine learning
     Content-Type: text/plain; charset=ISO-8859-1

   Hi Dan,

     I have a microarray dataset which I have performed an unsupervised
     Bayesian clustering algorithm on which divides the samples into four
     groups.  What I would like to do is:
     1) Pick a group of genes that best predict which group a sample belongs
     2) Determine how stable these prediction sets are through some sort of
     cross-validation (I would prefer not to divide my set into a training
     and test set for stage one)
     These steps fall into the supervised machine learning realm which I am
     not familiar with and googling around the options seem endless.  I was
     wondering whether anyone could suggest reasonable well-established
     algorithms to use for both steps.

   Have a look at:
   I would suggest going through the literature and looking at some papers that
   have dealt with your type of data as some of these packages are really aimed
   at specific types of data, e.g. tumor classification, survival data.
   E.g  see [2]http://www.pnas.org/content/98/19/10869.[3]abstract
     Many thanks
     Daniel Brewer, Ph.D.
     Institute of Cancer Research
     Molecular Carcinogenesis
     Email: daniel.brewer at icr.ac.uk
     The Institute of Cancer Research: Royal Cancer Hospital, a charitable
     Company Limited by Guarantee, Registered in England under Company No.
     534147 with its Registered Office at 123 Old Brompton Road, London SW7

   Lavinia Gordon
   Research Officer
   Murdoch Childrens Research Institute
   Royal Children's Hospital
   Flemington Road Parkville Victoria 3052 Australia
   telephone: +61 3 8341 6221
   This e-mail and any attachments to it (the "Communication") are, unless
   otherwise stated, confidential, may contain copyright material and is for
   the use only of the intended recipient. If you receive the Communication in
   error, please notify the sender immediately by return e-mail, delete the
   Communication and the return e-mail, and do not read, copy, retransmit or
   otherwise deal with it. Any views expressed in the Communication are those
   of  the individual sender only, unless expressly stated to be those of
   Murdoch Childrens Research Institute (MCRI) ABN 21 006 566 972 or any of its
   related entities. MCRI does not accept liability in connection with the
   integrity  of  or  errors  in  the Communication, computer virus, data
   corruption,  interference  or  delay arising from or in respect of the
   Please consider the environment before printing this email


   1. http://cran.ms.unimelb.edu.au/web/views/MachineLearning.html
   2. http://www.pnas.org/content/98/19/10869.abstract
   3. http://www.pnas.org/content/98/19/10869.abstract
   4. http://www.mcri.edu.au/

More information about the Bioconductor mailing list