[BioC] EdgeR multi-factor testing question

Yanzhu [guest] guest at bioconductor.org
Wed Jan 8 15:36:16 CET 2014


Dear Gordon,

I have one more question about the estimation of dispersion. 

When the three-way interaction term is insignificant, I will fit the model 2 without the three-way interaction to test the two-way interaction terms. When all interaction terms are insignificant, I fit the additive model (model 3) to test the main effect. Could I use the same dispersion for all the models, i.e., model 1 (including everything), model 2 (without three-way interaction term) and model 3 (additive model)? Could this dispersion be estimated under design of model 1?

Thank you!



Yanzhu



---------------------------------------------------------


Dear Yanzhu,

Your analysis is fine from a code point of view.  From a statistical point 
of view however your analysis is too simple because you are neglecting the 
principle of marginality:

   http://en.wikipedia.org/wiki/Principle_of_marginality

For the model you have fitted, it makes sense to test for the three-way 
interaction as you do.  However it does not make statistical sense to test 
for the main effects or two-interactions until you have established that 
the three-way interaction is non-significant.

For count data, the tests for the lower-level interactions need to be 
computed by successively removing each level of interactions from the 
model.  See for example:

   https://stat.ethz.ch/pipermail/bioconductor/2013-December/056584.html

This is the same as the anova() function does in R for unbalanced linear 
factorial models.

Furthermore, testing the two-way interations is only sensible for genes 
with non-signicant 3-way interactions.  Similarly, testing the main effect 
is only sensible for genes with non-significant 2-way and 3-way 
interactions.  Otherwise these tests have no useful scientific meaning.

This is a basic drawback of the factorial anova approach.  You might 
consider the alternative approach described in Section 3.3.1 of the edgeR 
User's Guide.

Best wishes
Gordon




 -- output of sessionInfo(): 

>  sessionInfo() 
R version 3.0.1 (2013-05-16)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] edgeR_3.2.4  limma_3.16.8

loaded via a namespace (and not attached):
[1] tools_3.0.1


--
Sent via the guest posting facility at bioconductor.org.



More information about the Bioconductor mailing list