[BioC] LIMMA similarity of contrast matrix, F test and topTable

Thu Jul 3 15:14:57 CEST 2014

Dear Bioconductors,

I am currently analysing gene expression data using microarrays. I currently have a simple model which looks at plant development. The design matrix can be replicated as follows:

devStadium <- factor(rep(c("Ovule" , "Seed24h" , "Globular" , "Cotyledon" , "MGreen" ,
     "PMGreen" , "Seedling"),each=2))
design.null <-model.matrix(~ devStadium ,
                         contrasts.arg=list(devStadium=contr.sum))

Next, I fit the linear models and construct a contrast matrix:

fit.null <- lmFit(eset.par.diff,design=design.null)

contrast.matrix <- makeContrasts(
 Intercept = (Intercept),
 Ovule= devStadium1,
 Seed24h= devStadium2,
 Globular= devStadium3,
 Cotyledon= devStadium4,
 MGreen= devStadium5,
 PMGreen= devStadium6,
 Seedling= -c(devStadium1+devStadium2+devStadium3+
                devStadium4+devStadium5+devStadium6),
 levels = design.null)

fit.null <- contrasts.fit(fit.null, contrast.matrix)
fit.null <- eBayes(fit.null)

Since I have coded the design matrix using sum coding, the intercept should indicate the average gene expression over all the seven developmental stadia. I test how many genes show significant expression during development through:

null.ict.table <- topTable(fit.null,coef=1,n=nrow(fit.null),adjust.method="BH")
mean(null.ict.table$adj.P.Val < 0.05)

This output tells me that 80.5% of the genes show expression during plant development.

Furthermore, I would like to perform an F test over all the developmental stadia I have sampled. In total, there are seven stadia. However, the application of this F test is uncertain to me. 
According to the manual, the F-statistic gained from the eBayes step tests whether any of the contrasts are non-zero for that gene. However, since I have defined an intercept in my contrasts, this might not be the way to go. Therefore, I thought of making a small contrast matrix (L) myself, using the following code:

L <- matrix(0,nrow=8,ncol=1)
L[2:8,1] <- 1
null.stadia.table <- topTable(fit.null,coef=L,n=nrow(fit.null),adjust.method="BH")
mean(null.stadia.table$adj.P.Val < 0.05) 

My reasoning here is that the 1’s represent the coefficients to be tested (thus the seven developmental stadia). The results of this output show me that EXACTLY the same amount of genes turn out to be significant than with testing the intercept.

I don’t know why this is, but it makes me believe my approach is wrong. If anyone can explain me why, this would lighten things up for me. Anyway, I proceeded using something else

null.stadia.table <- topTable(fit.null,coef=c(2:8),n=nrow(fit.null),adjust.method="BH")
mean(null.stadia.table$adj.P.Val < 0.05)

This approach is similar to using the contrast matrix

L <- matrix(0,nrow=7,ncol=7)
diag(L) <- 1
rbind(rep(0,7),L)

Here, I state testing the seven developmental stadium coefficients, and I get a lot less genes that turn out to be significant than with the two approaches above.
My primary question aims at the differences between both contrast matrices I have constructed in testing while using topTable.

Thank you in advance,
Koen