[BioC] limma for finding differentialy expressed genes of several groups

Thu Oct 18 10:24:17 CEST 2012

I would like to find the differentially expressed genes for several variables using the limma package for several groups.
I have the rma normalized matrix in the following format :

ID_REF GSM362180    GSM362181  GSM362188    GSM362189  GSM362192
244901 5.094871713 4.626623079 4.554272515 4.748604391 4.759221647
244902 5.194528083 4.985930299 4.817426064 5.151654407 4.838741605
244903 5.412329253 5.352970877 5.06250609  5.305709079 8.365082403
244904 5.529220594 5.28134657  5.467445095 5.62968933  5.458388909
244905 5.024052699 4.714631878 4.792865831 4.843975286 4.657188246
244906 5.786557533 5.242403911 5.060605782 5.458148567 5.890061836

where the different columns correspond to four different types of promoters and each of the four promoters has a biological replicate so totally there are 8 columns.There are totally 22810 genes and I would like to get a list of the genes which are differentially expressed

I tried using the Limma package to find the differentially expressed genes across several promoters ( with replicates). 
This is the code that I used:

Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"), levels = c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192","GSM362193","GSM362194","GSM362197","GSM362198"))

design <- model.matrix(~0 + Group)

colnames(design) <- c("GSM362180","GSM362181","GSM362188","GSM362189","GSM362192","GSM362193","GSM362194","GSM362197")
fit <- lmFit(modified, design)

where modified is the rma normalized data matrix as inputted in the above format.
I get the following error:

Coefficients not estimable: GSM362180 GSM362181 GSM362188 GSM362189 GSM362192 GSM362193 GSM362194 GSM362197 GSM362198 
Error in lm.fit(design, t(M)) : 0 (non-NA) cases

I managed to get help from the mailing list prior to this and was able to correct it in the following way.

 -- output of sessionInfo(): 

Group <- factor(c("p1", "p1", "p2", "p2", "p3","p3","p3","p4","p4"))
design <- model.matrix(~0+Group)
colnames(design) <- gsub("Group","", colnames(design))

For creating the contrast matrix I proceeded as :
fit<-lmFit(modified,design)
fit<-ebayes(fit)
fit<-lmFit(modified,design)
contrast.matrix<-makeContrasts(p1-p2,p1-p3,p1-p4,p2-p3,p2-p4,p3-p4,levels=design)
fit2<-contrasts.fit(fit,contrast.matrix)
fit2<-eBayes(fit2)
topTable(fit2,coef=1,adjust="fdr")
logFC   AveExpr         t      P.Value    adj.P.Val         B
14865 -3.063442 11.939646 -20.85957 5.020817e-09 8.235097e-05 10.906936
15107 -3.316203 13.136888 -19.79194 8.041764e-09 8.235097e-05 10.543106
12037  2.806403 10.772050  19.10380 1.103823e-08 8.235097e-05 10.292274
15931 -3.469330 10.325303 -18.53793 1.444120e-08 8.235097e-05 10.075671
18327  3.198993  9.633795  17.57118 2.328424e-08 8.331092e-05  9.682365
7521  -2.419999  7.373064 -17.16080 2.873576e-08 8.331092e-05  9.505924
16564  3.268568  8.365454  17.09028 2.980775e-08 8.331092e-05  9.475007
3832  -2.685268  7.540418 -16.89167 3.307237e-08 8.331092e-05  9.386966
10364  2.466369  6.779762  16.71021 3.640344e-08 8.331092e-05  9.305265
4967  -2.453614 11.409188 -16.62282 3.813877e-08 8.331092e-05  9.265479
o<-order(fit2$F.p.value)
fit2$genes[o[1:30],]

After the above step I get as NULL. I do not know where am making the mistake.

clas <- decideTests(fit2, method = "nestedF",
+                     adjust.method = "fdr", p = 0.05)

I get the following output which I know is quite wrong :
Contrasts
       p1 - p2 p1 - p3 p1 - p4 p2 - p3 p2 - p4 p3 - p4
  [1,]       0       0       0       0       0       0
  [2,]       0       0       0       0       0       0
  [3,]       0       0       0       0       0       0
  [4,]       0       0       0       0       0       0
  [5,]       0       0       0       0       0       0
  [6,]       0       0       0       0       0       0

--
Sent via the guest posting facility at bioconductor.org.