[BioC] edgeR - ANOVA-like test for differential gene expression

Wed Jun 26 18:11:18 CEST 2013

Hi Benedikt,

When do you coef=2:4, what you are actually doing is comparing the full 
model against a model where those coefficients are deleted entirely. 
What remains after deleting those coefficients is just the intercept 
term. Hence, you are testing all possible contrasts between conditions. 
There are 6 contrasts, but they are redundant, and there are only 3 
degrees of freedom being tested (since you are testing 3 coefficients). 
If you use topTable, you will get logFC values for 3 of the 6 contrasts. 
The other 3 logFC values can be calculated by subtracting the logFC 
values you have.

-Ryan Thompson

On 6/26/13 8:43 AM, Benedikt Drosse wrote:
> Dear edgeR users,
>
> I worked my way through the user's guide of edgeR. However, I still 
> have some questions and problems understanding  the anova-like way of 
> analyzing differential gene expression. I would be very grateful for 
> any help or suggestion. I hope I did not overlook any earlier post 
> concerning the same topic.
>
> My RNAseq data are derived from 4 different developmental stages with 
> 3 biological replicates each (see overview). I am interested in genes 
> differentially expressed between any of the developmental stages.
> I am not sure, which way of analysis is best to answer this question. 
> Making the "Anova-like" model asking for all contrasts at once or 
> doing all possible contrasts one by one.
>
> In principle the "anova-like" way of analysis should fit best to my 
> question, if I understood the edgeR user's guide correctly. So I make 
> a model matrix as given in "design" and fit the glm model for the 
> effect of Development including the intercept. To ask for expression 
> differences between any of the developmental stages I use "coef=2:4" 
> in the call of glmLRT.
>
> So my questions are:
> Will coef=2:4 yield only the contrasts of Development2-Development1, 
> Development3-Development1, Development4-Development1 (as it is 
> described in the user's guide),
> or will it also test the contrasts of Development3-Development2, 
> Development4-Development2 and Development4-Development3, which would 
> also be important contrasts for me to analyze?
> For further filtering and candidate gene identification I would like 
> to extract the logFoldChanges and significance level between any of 
> the developmental stages.
> If the "anova-like" analysis provides all of the six possible 
> contrasts, is there an easy way to extract logFC and significance 
> level for any of these contrasts from the resulting glmLRT-output?
>
> If the anova-like analysis does NOT yield ALL of the possible 
> contrasts, I would have to make the contrasts one by one.
> In this case would it be more useful to build the glm model without an 
> intercept (see design_2), and then ask for the six contrasts 
> separately as indicated in the comment of glm_data_2?
> Doing so it would be easy for me to get the logFC and the 
> corresponding signficance level.
> Is it conceptionally wrong to do the single contrasts instead of an 
> anova-like analysis concerning the multiple testing problem (6 tests 
> instead of 1)?
>
> I will be very grateful for any comment on my questions,
>
> Best regards,
> Benedikt Digel
>
>
>
> #R code:
> overview = matrix(nrow=12, ncol=1)
>     overview[,1] = factor(c(1,1,1,2,2,2,3,3,3,4,4,4))
>         colnames(overview) = "Development"
>         rownames(overview) = paste("library_", 1:12, sep="")
>
> design = model.matrix(object = ~Development)
>     fit = glmFit(y=data_with_tagwiseDISP, design=design) # I did not 
> provide the corresponding data for the analysis, since it should just 
> indicate the way I call glmFit
>         glm_data = glmLRT(glmfit=fit, coef=2:4)
>
> design_2 = model.matrix(object=~0+Development)
>     fit_2 = glmFit(y=data_with_tagwiseDISP, design=design_2)     # I 
> did not provide the corresponding data for the analysis, since it 
> should just indicate the way I call glmFit
>         glm_data_2 = glmLRT(glmfit=fit_2, contrast=c(-1,1,0,0)) # 
> c(-1,0,1,0); c(-1,0,0,1); c(0,-1,1,0); c(0,1-,0,1); c(0,0,-1,1) all 
> possible contrasts to extract effects of all comparisons
>