[BioC] edgeR factorial experiment contrasts

James W. MacDonald jmacdon at uw.edu
Thu Sep 4 16:39:58 CEST 2014


Hi Jahn,



On Thu, Sep 4, 2014 at 5:16 AM, Jahn Davik <jahn.davik at bioforsk.no> wrote:

>  Hi Jim,
>
> Thanks for clarification. Appreciate it.
>
>
>
> For the sake of it I ran a couple of comparisons between the one-way
> approach that you advocate and the factorial approach. For the simple
> comparisons, e.g., E00 vs J00, the two approaches returned pretty much the
> same results.
>

This comparison is not simple when you fit a factorial model. If you used
the contrast you mentioned earlier (something like c(-1,1,0,0,0,...)), then
you aren't comparing E00 vs J00. Instead you are comparing the main effects
for E and J, which is not the same as comparing E00 and J00 (which is a
direct comparison of the average expression for genes at time zero). The
main effect for E and J is a comparison of the overall means for the two
groups, rather than the means at time zero.


 When I did the comparison ‘average(E00+E01+E05)’ vs ‘average(E48+E96)’ the
> topTags were pretty much equal. Still, a big fraction (~ 15%) of the
> significant (FDR<.001) transcripts found in one of the groups, were not
> found in the other. I suspect that can be explained by including the
> interactions in the factorial approach and not taking them into account in
> the one-way approach?
>

No. Like I said before, if you compute equivalent contrasts, you will get
identical results regardless of the underlying parameterization. If you are
getting differences it is because the contrasts you are computing for the
two parameterizations are not equivalent (e.g., you aren't making the same
comparisons). This has nothing to do with the presence or absence of
interaction terms.



>
>
> Another thing about doing the factorial approach is that you end up with a
> contrast statement something like:
>
>
>
> lrt <- glmLRT(fit,contrast=c(0,0,2,2,-3,-3,0,0,0,0))
>

Without seeing the design matrix and taking the time to interpret the
coefficients, I cannot say for sure if this is correct or not. I doubt it
is correct, but cannot say for sure. As you note, I did tell Julia Pickl
that coefficients for a contrast usually sum to zero. However, if the
coefficients being compared are already comparisons themselves, you have to
interpret the contrast in light of the underlying coefficients.

But this brings us back to the main point I have been trying to make all
along. You certainly can use a factorial parameterization for this
analysis, but there is no logical reason for doing so. The factorial
parameterization is decomposing your data into coefficients that are not
really useful in this context, and then you have to do all this work to
figure out the interpretation for each coefficient in order to then figure
out what contrast you should be using. The alternative is to parameterize
your model in a very simple way that you can then easily construct
contrasts to make any comparisons you might care to make. Why you want to
do things the hard way instead of the easy way eludes me.

But don't take my word for it; this is something that Gordon Smyth has been
talking about both on this list and in his user's guides for years now. As
an example, see section 9.5.4 in the limma User's Guide, where he shows a
couple of factorial parameterizations and the required contrasts. And he
ends with this sentence: "Since the approaches are equivalent, users are
free to choose whichever one is most intuitive or convenient."


Best,

Jim





>
> (the design matrix is slightly different from the one you commented upon
> below) where the coefficients certainly don’t add up to zero. This comes
> from the ‘algebraic contortions’ as you call them. I got the impression
> from your correspondence with Julia Pickl on this site that contrast
> coefficients should sum up to zero? But that may only apply to her
> situation?
>
>
>
> Finally, I would be pleased to hear thoughts on which approach is
> statistically ‘best’. I’ve been under the impression in statistically
> analyses that using the entire data set from an experiment is good
> practice, even if one makes simple pairwise comparison. Does edgeR do that?
>
>
>
> Thank you.
>
>
>
> Best,
>
> jahn
>
>
>
>
>
> *Fra:* James W. MacDonald [mailto:jmacdon at uw.edu]
> *Sendt:* 3. september 2014 23:38
> *Til:* Jahn Davik
> *Kopi:* Jahn Davik [guest]; bioconductor at r-project.org
>
> *Emne:* Re: [BioC] edgeR factorial experiment contrasts
>
>
>
> Hi Jahn,
>
>
>
> No, it's not wrong. You can parameterize any number of ways, and you will
> get identical results for equivalent contrasts. But given that you will get
> identical results, and one way is conceptually easier to figure out, I
> would in general advocate using the conceptually easier parameterization
> over a more difficult one that requires lots of algebraic contortions in
> order to figure out what contrast to use.
>
>
>
> I believe you sent a truncated version of your experiment (there were no
> replicates at each time point), so I can't give you explicit code. However,
> say you have a design matrix that looks something like this:
>
>
>
> E00 J00 E01 J01 E05 J05 E48 J48 E96 J96
>
> <zeros and ones go here>
>
>
>
> And you want to compare the third contrast you mention below. It's as
> simple as
>
>
>
> glmLRT(fit, contrast = c(0.33, 0, 0.33, 0,0.33, 0,-0.5, 0, -0.5, 0))
>
>
>
> Which tests the average of the E0-5 time points against the average of the
> E48-96 time points.
>
>
>
> And interactions are simple as well. Say you want the interaction between
> times 0 and 96:
>
>
>
> glmLRT(fit, contrast = c(1,-1,0,0,0,0,0,0,-1,1))
>
>
>
> Simple, no?
>
>
>
> Best,
>
>
>
> Jim
>
>
>
>
>
>
>
> On Wed, Sep 3, 2014 at 4:10 PM, Jahn Davik <jahn.davik at bioforsk.no> wrote:
>
>  Hi Jim,
> Yes, I believe I can see that (except for possibly my third question).
> It's just that I tend to think along these lines. Are they wrong?
>
> jahn
>  ------------------------------
>
> *Fra:* James W. MacDonald [jmacdon at uw.edu]
> *Sendt:* 3. september 2014 22:06
> *Til:* Jahn Davik [guest]
> *Kopi:* bioconductor at r-project.org; Jahn Davik
> *Emne:* Re: [BioC] edgeR factorial experiment contrasts
>
> Hi Jahn,
>
>
>
> Things will go much easier for you if you just parameterize as cell means
> rather than a conventional factorial model.
>
>
>
> grps <- factor(paste(Group, Time, sep = "_"))
>
>
>
> design <- model.matrix (~0+grps)
>
>
>
> Then all the comparisons are easy to do.
>
>
>
> Best,
>
>
>
> Jim
>
>
>
>
>
>
>
> On Wed, Sep 3, 2014 at 12:39 PM, Jahn Davik [guest] <
> guest at bioconductor.org> wrote:
>
> Hi,
> I am trying to figure out how to do comparisons using edgeR but have to
> admit that I am on unsure ice.
> My experiment is abiotic stress on two genotypes (E and J) and these have
> been sampled for RNA-seq at five time points (00, 01, 05, 48, and 96).
> I would like to analyze this experiment as a factorial (which I am sure
> could be argued), and when I set up the following design:
>
> > design<-model.matrix(~0 + Group + Time + Group*Time),
>
> I get the following design matrix (I've taken out the 3 three replicates
> at experimental cell.):
>
> > design
>       GroupE GroupJ Time01 Time05 Time48 Time96 GroupJ:Time01
> GroupJ:Time05 GroupJ:Time48 GroupJ:Time96
> E00      1      0      0      0      0      0             0             0
>            0             0
> J00      0      1      0      0      0      0             0             0
>            0             0
> E01      1      0      1      0      0      0             0             0
>            0             0
> J01      0      1      1      0      0      0             1             0
>            0             0
> E05      1      0      0      1      0      0             0             0
>            0             0
> J05      0      1      0      1      0      0             0             1
>            0             0
> E48      1      0      0      0      1      0             0             0
>            0             0
> J48      0      1      0      0      1      0             0             0
>            1             0
> E96      1      0      0      0      0      1             0             0
>            0             0
> J96      0      1      0      0      0      1             0             0
>            0             1
> attr(,"assign")
>  [1] 1 1 2 2 2 2 3 3 3 3
> attr(,"contrasts")
> attr(,"contrasts")$Group
> [1] "contr.treatment"
>
> attr(,"contrasts")$Time
> [1] "contr.treatment"
>
>
> I have some questions regarding testing contrasts:
> First though, I realize that I can pull out the data for E00 and J00, say,
> and compare these using exactTest, but then I guess that you are
> disregarding all information that lies in the remaining data set? That does
> not seem to me to be a good way to do it. But I may be wrong here?
>
> Second, if I want to do the comparison E00 vs J00 using the design
> indicated above, how should that be set up? Would
>
> lrt <- glmLRT(fit,contrast=c(1,-1,0,0,0,0,0,0,0,0)) do it?
>
> Third, if I want to compare (E00 + E01 + E05) vs (E48 + E96), how would I
> set up that contrast?
> For my own pedagogic reasons I can show how I was thinking in this case:
>
> E00 = L1
> E01 = L1 + L4   x 2  -->  6L1 + 2L3 + 2L4
> E05 = L1 + L4
>
> versus                                       --> 2L3 + 2L4 - 3L5 -3L6
>
> E48 = L1 + L5   x 3  -->  6L1 + 3L5 + 3L6
> E96 = L1 + L6
>
>
> resulting in the contrast:
>
> lrt <- glmLRT(fit,contrast=c(0,0,2,2,-3,-3,0,0,0,0))
>
> Then I seem to remember that contrasts should sum up to zero? If anybody
> could tell me where I am loosing it, I would be very obliged.
>
> Thank you
>
> jd
>
>
>
>
>
>  -- output of sessionInfo():
>
> R version 3.1.0 (2014-04-10)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252  LC_CTYPE=Norwegian
> (Bokmål)_Norway.1252
> [3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C
> [5] LC_TIME=Norwegian (Bokmål)_Norway.1252
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods
>  base
>
> other attached packages:
> [1] edgeR_3.6.8          limma_3.20.9         BiocInstaller_1.14.2
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
> >
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
>
>
>
> --
>
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
>
>
>
>
> --
>
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>



-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list