[BioC] Selecting genes for machine learning

Djork-Arné Clevert okko at clevert.de
Mon Jun 27 11:22:56 CEST 2011

Dear January,

if you have Affymetrix data you could try to filter genes by their 
information content.  You can find the Bioinformatics publication 

"I/NI-calls for the exclusion of non-informative genes: a highly effective 
filtering tool for microarray data"

at http://bioinformatics.oxfordjournals.org/content/23/21/2897.full.

The I/NI filter is included in our farms package, which is according to the
Affycomp benchmark the leading summarization method with respect to
sensitivity and specificity.

Greetings from Berlin,


dipl.-inf. djork clevert | gleimstr. 13a | d-10437 berlin
e: okko at clevert.de
p: +49.30.4432 4702
f: +49.30.6883 5307

Am 24.06.2011 um 16:27 schrieb January Weiner:

> Dear all,
> what is currently regarded as the optimal strategy to select genes for
> machine learning analysis? Taking all of the 40k or so genes is not
> doable (at least with randomForest, which I use). "Bioconductor case
> studies" suggests using nsFilter with argument var.cutoff=0.75,
> however I am not sure how that is calculated. Are the genes sorted
> according to absolute variance? If yes, is that method really suitable
> for filtering "uninteresting" genes?
> Kind regards,
> January
> -- 
> -------- Dr. January Weiner 3 --------------------------------------
> Max Planck Institute for Infection Biology
> Charitéplatz 1
> D-10117 Berlin, Germany
> Web   : www.mpiib-berlin.mpg.de
> Tel     : +49-30-28460514
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

More information about the Bioconductor mailing list