[BioC] CMA package: variable selection with limma

Renaud Gaujoux renaud at mancala.cbio.uct.ac.za
Fri Mar 6 16:08:47 CET 2009

   I  used  the CMA package (v1.0.0) to evaluate the performance of a kNN
   classification after a feature selection step using limma, including the
   whole procedure in the cross-validation.
   Before finding the CMA package I had implemented myself the same procedure
   (cross-validation of feature selection) and I couldn't get the same result
   as with CMA.
   Looking to the CMA code I found that the feature were selected by their rank
   based on the statistic computed by the function classifyTestsF from limma
   (see the CMA code below). My understanding is that this function computes a
   F-statistic to test the joint significance of all the coefficients in the
   regression  model  (in  this case 2: the intercept and the contrast of
   interest, i.e. the difference between the two groups of samples). In my
   procedure  I  had ranked the genes based on the p-values of the single
   coefficient of interest (not the intercept), as suggested by the limma
   Indeed the selected genes in the first step are different.
   Is there a reason for also looking at the joint significance (intercept +
   I guess the fact intercept is significantly different from zero gives us the
   information that the gene is actually expressed at a certain level, making
   us more confident that what is observed is not noise?
   In standard regression problem (more samples than variables) I would indeed
   test first the significance of the model as a whole (F-test), but in the
   context of microarray I had never seen this practice before. Am I missing
   Which practice is recommended in this context?
   limmatest <- function(X, y, learnind, ...){
   require(limma, quietly=TRUE)
   yprime <- y[learnind]
   Xprime <- X[learnind,,drop=FALSE]
   yprime <- as.factor(yprime)
   des <- model.matrix(~yprime)
   limo <- lmFit(t(Xprime), des)
   outp <- eBayes(limo, ...)
   stat <-  classifyTestsF(object = outp, fstat.only = TRUE)
   attr(stat, "df1") <- NULL
   attr(stat, "df2") <- NULL
   new("varseloutput", varsel=stat)

More information about the Bioconductor mailing list