[BioC] Multiple groups comparison using EdgeR

Tue Nov 8 09:15:51 CET 2011

Hi all,

I'm using edgeR for multiple groups comparison.

I have two treatments (WT and mutant), each treatment in three tissues (root, leaf, seed), each tissue either has replicates or not.
I want to compare the expression between WT and mutant in general (considering all the 3 tissues) and in each tissue seperately.

The following is my code to find DE between WT and mutant considering all the 3 tissues: 
treatment=factor(c(rep('WT',8),rep('MUTANT',8)))
tissue=factor( c('root','root','root','leaf','seed','seed','seed','seed', 'root','root','root','leaf','seed','seed','seed','seed'))
design <- model.matrix(~treatment+tissue)
dge<- DGEList(dat1, group=rep(c("WT","OXT"),each=8))

dge <- estimateGLMCommonDisp(dge, design)
dge$common.dispersion
glmfit.dge <- glmFit(dge, design,dispersion=dge$common.dispersion)
lrt.dge <- glmLRT(dge, glmfit.dge, coef=2)  # I want to compare WT and MUTANT, adjusting for tissue root, seed, and leaf, so the coef=2 which is treatmentWT           

I have some questions:
1) How to interpret the results? I thought the final DE considers DE in root between WT and Mutant, and DE in leaf between WT and Mutant, and leaf, right? So If I want to know the foldChange, should I get seperated FC of root, leaf, and seed? 

2) If to compare leaf and seed, then I should use "lrt.dge <- glmLRT(dge, glmfit.dge, coef=4", right? 
But if I want to compare root and seed, can I mannually change the levels in tissue, like using "levels(tissue)=c('seed','leaf','root')", then do the same comparison?

3) Some experiments in root are paired, like WT_root1 paired with Mutant_root1, need I consider this information? I'm afraid this will make the design matrix too complicate, there will be three factors, and only 1 or 2 samples are paired from the 16 samples.  

4) From your experience, is it better to use pooled data to compare WT and Mutant (exactTest), or adjust multiple factors (GLMfit)? Is there any way to evaluate the result? because I don't know which gene is actually DE. 

5) What can I do if there are very few DE genes after adjusting pvalue? I have filtered genes by CPM>1 for 8 or 4 samples. There are only 20 genes can be found from the filtered 14,000 genes. (the total gene is ~120,000).

--> Output design matrix is like this:
   (Intercept) treatmentWT tissueroot tissueseed
1            1           1          1          0
2            1           1          1          0
3            1           1          1          0
4            1           1          0          0
5            1           1          0          1
6            1           1          0          1
7            1           1          0          1
8            1           1          0          1
9            1           0          1          0
10           1           0          1          0
11           1           0          1          0
12           1           0          0          0
13           1           0          0          1
14           1           0          0          1
15           1           0          0          1
16           1           0          0          1

> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Chinese (Simplified)_People's Republic of China.936 
[2] LC_CTYPE=Chinese (Simplified)_People's Republic of China.936   
[3] LC_MONETARY=Chinese (Simplified)_People's Republic of China.936
[4] LC_NUMERIC=C                                                   
[5] LC_TIME=Chinese (Simplified)_People's Republic of China.936    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_2.4.0

loaded via a namespace (and not attached):
[1] limma_3.10.0

I would really appreciate your comments or suggestions.

Many thanks!

Xiaohui Wu