[BioC] Statistics question for multi-factor interaction test in edgeR

Sat May 11 02:32:31 CEST 2013

Dear Hilary,

Your test for the 3-way interaction is correct, although 3-way 
interactions are pretty hard to interpret.

However testing for the 2-way interaction in the presence of a 3-way 
interaction does not make statistical sense.  This is because the 
parametrization of the 2-way interaction as a subset of the 3-way is 
somewhat arbitrary.  Before you can test the 2-way interaction 
species*treatment in a meaningful way you would need to accept that the 
3-way interaction is not necessary and remove it from the model.

In general, I am of the opinion that classical statistical factorial 
interation models do not usually provide the most meaningful 
parametrizations for genomic experiments.  In most cases, I prefer to fit 
the saturated model (a different level for each treatment combination) and 
make specific contrasts.  There is some discussion of this in the limma 
User's Guide.

In your case, I guess that you might want to test for species*treatment 
interaction separately at each time point.  It is almost impossible to do 
this within the classical 3-way factorial setup. However it is easy with 
the one-way approach I just mentioned, or else you could use:

   ~Age + Age:Species + Age:Treatment + Age:Species:Treatment

Best wishes
Gordon

> Date: Thu, 9 May 2013 14:55:46 -0400
> From: Hilary Smith <Hilary.A.Smith.964 at nd.edu>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] Statistics question for multi-factor interaction test
> 	in	edgeR
>
> Hi. I need to generate two GLM tests of a factorial design with RNA-Seq 
> count data. I have 3 factors with 2 levels apiece (2 species X 2 
> treatments X 2 times), and 4 separate replicates each (i.e., we made a 
> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the 
> interaction of species*treatment, as we think species A will alter gene 
> expression in the treatment stress vs. treatment benign, whereas species 
> B is expected to show little change. However, we’d like to also do 
> another test of species*treatment*time, because it is possible that the 
> ability of species A to alter gene expression in response to the stress 
> treatment may differ at the 1st versus 2nd time point.
>
> I think the way to set this up, is to create a design matrix as follows, 
> with the lrt test with coef 5 giving the differentially expressed genes 
> for the species*treatment test, and coef 8 giving the the differentially 
> expressed gene for the species*treatment*time test (after calling 
> topTags that is). Yet to ensure I have the statistics correct, my 
> questions are: (1) is this thinking correct, as I don’t see many 3x2 
> factorial models to follow, and (2) do I need to set up a reference 
> somehow (which I assume would be the set of four samples with 
> TreatmentBenign*SpeciesB*Time2, but I’m not fully sure if that is 
> correct or needed).
>
> Many thanks in advance for your insight!
> ~Hilary
>
>> designFF <- model.matrix(~Treatment*Species*Age)
>> colnames(designFF)
> [1] "(Intercept)"
> [2] " TreatmentStress"
> [3] "SpeciesA "
> [4] "Time1"
> [5] "TreatmentStress:SpeciesA"
> [6] "TreatmentStress:Time1"
> [7] "SpeciesA:Time1"
> [8] "TreatmentStress:SpeciesA:Time1"
>
> And then to run tests with:
>> fit <- glmFit(y, designFF)
>
>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5)
>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8)

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}