[BioC] feature selection

Nicholas Lewin-Koh nikko at hailmail.net
Tue Mar 25 11:52:22 MET 2003


Hi Karen,
I don't know that starting with randomForest and using the importance
values is the best way to start. I would suggest first filtering the
data in different ways, like 200 largest F values. If your question is
to identify differentially expressed genes than you really want a
multiple comparisons approach. The multcomp package is quite good. If
the interest is a classification rule try filtering in different ways,
as suggested above, and then try some exploratory discriminant analysis.
I have gotten good results with the fda function in the mda package on
CRAN. Use the gen.ridge method option and that gives penalized
discriminant analysis. This can help to look at the projections and just
determine if the states are seperable. You can also look at the
coefficients for each variable. After some careful EDA than go for the
classification.

Nicholas  


Karen writes>
Hello Bioconductor folk,
Can any of the bioconductor packages be used on a .pcl file, rather than
starting with the raw data?
I am starting with a .pcl file containing approximately 900 genes and 50
samples, which I have read using read.table. The classification is
known, and
there are 3 classes of samples. I am interested in reducing the number
of
genes. I would like to use the R RandomForest package for this task. 
Is this appropriate? I'm new to this so will appreciate any help.

Thanks
Karen



More information about the Bioconductor mailing list