[BioC] Statistics question for multi-factor interaction test in edgeR

Hilary Smith Hilary.A.Smith.964 at nd.edu
Sat May 11 15:57:39 CEST 2013


Dear Gordon,
Thank you. We cannot really remove the 3-way interaction, because there
are some genes that respond to the 3-way interaction (even though a
classic parameterization with the intercept, leads to about an order of
magnitude more genes responding to a 2-way vs. 3-way interaction).

Testing for species*treatment at each time, definitely seems the closest
way to address the questions we have. Roughly following section 3.3.1 of
the edgeR user guide ("Defining each treatment combination as a group") on
the creation of specific contrasts (and many thanks for your updated User
Guides to show this formulation in detail), I set up tests as below. If
this is valid, that's great, as it does make much more intuitive sense to
me (and biological in terms of addressing our questions of interest).

Thank you for your help; I REALLY appreciate it!
Best,
Hilary

> design1 = model.matrix(~0+Group_time1, data=Ytime1$samples)
> design2 = model.matrix(~0+Group_time2, data=Ytime2$samples)

> myC_TimeI = makeContrasts(
+ Treatment1_vs_Treatment2_SpeciesA_TimeI = SpeciesA_TimeI_Treatment2 -
SpeciesA_TimeI_Treatment1,
+ Treatment1_vs_Treatment2_SpeciesB_TimeI = SpeciesB_TimeI_Treatment2 -
SpeciesB_TimeI_Treatment1,
+ Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI =
(SpeciesA_TimeI_Treatment2 - SpeciesA_TimeI_Treatment1)-(
SpeciesB_TimeI_Treatment2 - SpeciesB_TimeI_Treatment1),
+ levels=design1)
> 
> myC_TimeII = makeContrasts(
+ Treatment1_vs_Treatment2_SpeciesA_TimeII = SpeciesA_TimeII_Treatment2 -
SpeciesA_TimeII_Treatment1,
+ Treatment1_vs_Treatment2_SpeciesB_TimeII = SpeciesB_TimeII_Treatment2 -
SpeciesB_TimeII_Treatment1,
+ Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeII =
(SpeciesA_TimeII_Treatment2 - SpeciesA_TimeII_Treatment1)-(
SpeciesB_TimeII_Treatment2 - SpeciesB_TimeII_Treatment1),
+ levels=design2)

Then, after using glmFit on each of the two makeContrasts above, I ran
glmLRT 6 times, once on each of the 6 contrasts (i.e., on
Treatment1_vs_Treatment2_SpeciesA_TimeI,
Treatment1_vs_Treatment2_SpeciesB_TimeI, etc. through
Treatment1_vs_Treatment2_ SpeciesAvsSpeciesB_TimeI.

Ultimately I may then try to use the VennCounts/Diagram from limma to show
the overlap between the 4 sets visually:
Treatment1vsTreatment2_SpeciesA_TimeI; Treatment1
vsTreatment2_SpeciesA_TimeII; Treatment1 vsTreatment2_SpeciesB_TimeI; and
Treatment1 vsTreatment2_SpeciesB_TimeII. However, this may take a bit to
set up, because my two time points have slightly different gene lists
(i.e., to run the analyses on the Times differently, after filtering out
reads to only retain those with cpm>1 in „4 libraries, TimesI and II did
not retain exactly the same genes, though it¹s rather close).





--------------------------------------------------






On 5/10/13 8:32 PM, "Gordon K Smyth" <smyth at wehi.EDU.AU> wrote:

>Dear Hilary,
>
>Your test for the 3-way interaction is correct, although 3-way
>interactions are pretty hard to interpret.
>
>However testing for the 2-way interaction in the presence of a 3-way
>interaction does not make statistical sense.  This is because the
>parametrization of the 2-way interaction as a subset of the 3-way is
>somewhat arbitrary.  Before you can test the 2-way interaction
>species*treatment in a meaningful way you would need to accept that the
>3-way interaction is not necessary and remove it from the model.
>
>In general, I am of the opinion that classical statistical factorial
>interation models do not usually provide the most meaningful
>parametrizations for genomic experiments.  In most cases, I prefer to fit
>the saturated model (a different level for each treatment combination)
>and 
>make specific contrasts.  There is some discussion of this in the limma
>User's Guide.
>
>In your case, I guess that you might want to test for species*treatment
>interaction separately at each time point.  It is almost impossible to do
>this within the classical 3-way factorial setup. However it is easy with
>the one-way approach I just mentioned, or else you could use:
>
>   ~Age + Age:Species + Age:Treatment + Age:Species:Treatment
>
>Best wishes
>Gordon
>
>
>> Date: Thu, 9 May 2013 14:55:46 -0400
>> From: Hilary Smith <Hilary.A.Smith.964 at nd.edu>
>> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
>> Subject: [BioC] Statistics question for multi-factor interaction test
>> 	in	edgeR
>>
>> Hi. I need to generate two GLM tests of a factorial design with RNA-Seq
>> count data. I have 3 factors with 2 levels apiece (2 species X 2
>> treatments X 2 times), and 4 separate replicates each (i.e., we made a
>> total of 2*2*2*4 = 32 separate libraries). Our main interest is in the
>> interaction of species*treatment, as we think species A will alter gene
>> expression in the treatment stress vs. treatment benign, whereas
>>species 
>> B is expected to show little change. However, we¹d like to also do
>> another test of species*treatment*time, because it is possible that the
>> ability of species A to alter gene expression in response to the stress
>> treatment may differ at the 1st versus 2nd time point.
>>
>> I think the way to set this up, is to create a design matrix as
>>follows, 
>> with the lrt test with coef 5 giving the differentially expressed genes
>> for the species*treatment test, and coef 8 giving the the
>>differentially 
>> expressed gene for the species*treatment*time test (after calling
>> topTags that is). Yet to ensure I have the statistics correct, my
>> questions are: (1) is this thinking correct, as I don¹t see many 3x2
>> factorial models to follow, and (2) do I need to set up a reference
>> somehow (which I assume would be the set of four samples with
>> TreatmentBenign*SpeciesB*Time2, but I¹m not fully sure if that is
>> correct or needed).
>>
>> Many thanks in advance for your insight!
>> ~Hilary
>>
>>> designFF <- model.matrix(~Treatment*Species*Age)
>>> colnames(designFF)
>> [1] "(Intercept)"
>> [2] " TreatmentStress"
>> [3] "SpeciesA "
>> [4] "Time1"
>> [5] "TreatmentStress:SpeciesA"
>> [6] "TreatmentStress:Time1"
>> [7] "SpeciesA:Time1"
>> [8] "TreatmentStress:SpeciesA:Time1"
>>
>> And then to run tests with:
>>> fit <- glmFit(y, designFF)
>>
>>> lrtInteractionStressSpecies <- glmLRT(fitFF, coef=5)
>>> lrtInteractionStressSpeciesTime <- glmLRT(fitFF, coef=8)
>
>______________________________________________________________________
>The information in this email is confidential and intended solely for the
>addressee.
>You must not disclose, forward, print or use it without the permission of
>the sender.
>______________________________________________________________________



More information about the Bioconductor mailing list