[BioC] edgeR factorial experiment contrasts

Jahn Davik jahn.davik at bioforsk.no
Mon Sep 8 08:28:39 CEST 2014


Hi Jim,
Thank you for the thorough feedback. Appreciate it.

Best
jahn

Fra: James W. MacDonald [mailto:jmacdon at uw.edu]
Sendt: 4. september 2014 16:40
Til: Jahn Davik
Kopi: Jahn Davik [guest]; bioconductor at r-project.org
Emne: Re: [BioC] edgeR factorial experiment contrasts

Hi Jahn,


On Thu, Sep 4, 2014 at 5:16 AM, Jahn Davik <jahn.davik at bioforsk.no<mailto:jahn.davik at bioforsk.no>> wrote:
Hi Jim,
Thanks for clarification. Appreciate it.

For the sake of it I ran a couple of comparisons between the one-way approach that you advocate and the factorial approach. For the simple comparisons, e.g., E00 vs J00, the two approaches returned pretty much the same results.

This comparison is not simple when you fit a factorial model. If you used the contrast you mentioned earlier (something like c(-1,1,0,0,0,...)), then you aren't comparing E00 vs J00. Instead you are comparing the main effects for E and J, which is not the same as comparing E00 and J00 (which is a direct comparison of the average expression for genes at time zero). The main effect for E and J is a comparison of the overall means for the two groups, rather than the means at time zero.


When I did the comparison ‘average(E00+E01+E05)’ vs ‘average(E48+E96)’ the topTags were pretty much equal. Still, a big fraction (~ 15%) of the significant (FDR<.001) transcripts found in one of the groups, were not found in the other. I suspect that can be explained by including the interactions in the factorial approach and not taking them into account in the one-way approach?

No. Like I said before, if you compute equivalent contrasts, you will get identical results regardless of the underlying parameterization. If you are getting differences it is because the contrasts you are computing for the two parameterizations are not equivalent (e.g., you aren't making the same comparisons). This has nothing to do with the presence or absence of interaction terms.



Another thing about doing the factorial approach is that you end up with a contrast statement something like:

lrt <- glmLRT(fit,contrast=c(0,0,2,2,-3,-3,0,0,0,0))

Without seeing the design matrix and taking the time to interpret the coefficients, I cannot say for sure if this is correct or not. I doubt it is correct, but cannot say for sure. As you note, I did tell Julia Pickl that coefficients for a contrast usually sum to zero. However, if the coefficients being compared are already comparisons themselves, you have to interpret the contrast in light of the underlying coefficients.

But this brings us back to the main point I have been trying to make all along. You certainly can use a factorial parameterization for this analysis, but there is no logical reason for doing so. The factorial parameterization is decomposing your data into coefficients that are not really useful in this context, and then you have to do all this work to figure out the interpretation for each coefficient in order to then figure out what contrast you should be using. The alternative is to parameterize your model in a very simple way that you can then easily construct contrasts to make any comparisons you might care to make. Why you want to do things the hard way instead of the easy way eludes me.

But don't take my word for it; this is something that Gordon Smyth has been talking about both on this list and in his user's guides for years now. As an example, see section 9.5.4 in the limma User's Guide, where he shows a couple of factorial parameterizations and the required contrasts. And he ends with this sentence: "Since the approaches are equivalent, users are free to choose whichever one is most intuitive or convenient."


Best,

Jim





(the design matrix is slightly different from the one you commented upon below) where the coefficients certainly don’t add up to zero. This comes from the ‘algebraic contortions’ as you call them. I got the impression from your correspondence with Julia Pickl on this site that contrast coefficients should sum up to zero? But that may only apply to her situation?

Finally, I would be pleased to hear thoughts on which approach is statistically ‘best’. I’ve been under the impression in statistically analyses that using the entire data set from an experiment is good practice, even if one makes simple pairwise comparison. Does edgeR do that?

Thank you.

Best,
jahn


Fra: James W. MacDonald [mailto:jmacdon at uw.edu<mailto:jmacdon at uw.edu>]
Sendt: 3. september 2014 23:38
Til: Jahn Davik
Kopi: Jahn Davik [guest]; bioconductor at r-project.org<mailto:ioconductor at r-project.org>

Emne: Re: [BioC] edgeR factorial experiment contrasts

Hi Jahn,

No, it's not wrong. You can parameterize any number of ways, and you will get identical results for equivalent contrasts. But given that you will get identical results, and one way is conceptually easier to figure out, I would in general advocate using the conceptually easier parameterization over a more difficult one that requires lots of algebraic contortions in order to figure out what contrast to use.

I believe you sent a truncated version of your experiment (there were no replicates at each time point), so I can't give you explicit code. However, say you have a design matrix that looks something like this:

E00 J00 E01 J01 E05 J05 E48 J48 E96 J96
<zeros and ones go here>

And you want to compare the third contrast you mention below. It's as simple as

glmLRT(fit, contrast = c(0.33, 0, 0.33, 0,0.33, 0,-0.5, 0, -0.5, 0))

Which tests the average of the E0-5 time points against the average of the E48-96 time points.

And interactions are simple as well. Say you want the interaction between times 0 and 96:

glmLRT(fit, contrast = c(1,-1,0,0,0,0,0,0,-1,1))

Simple, no?

Best,

Jim



On Wed, Sep 3, 2014 at 4:10 PM, Jahn Davik <jahn.davik at bioforsk.no<mailto:jahn.davik at bioforsk.no>> wrote:
Hi Jim,
Yes, I believe I can see that (except for possibly my third question). It's just that I tend to think along these lines. Are they wrong?

jahn
________________________________
Fra: James W. MacDonald [jmacdon at uw.edu<mailto:jmacdon at uw.edu>]
Sendt: 3. september 2014 22:06
Til: Jahn Davik [guest]
Kopi: bioconductor at r-project.org<mailto:bioconductor at r-project.org>; Jahn Davik
Emne: Re: [BioC] edgeR factorial experiment contrasts
Hi Jahn,

Things will go much easier for you if you just parameterize as cell means rather than a conventional factorial model.

grps <- factor(paste(Group, Time, sep = "_"))

design <- model.matrix (~0+grps)

Then all the comparisons are easy to do.

Best,

Jim



On Wed, Sep 3, 2014 at 12:39 PM, Jahn Davik [guest] <guest at bioconductor.org<mailto:guest at bioconductor.org>> wrote:
Hi,
I am trying to figure out how to do comparisons using edgeR but have to admit that I am on unsure ice.
My experiment is abiotic stress on two genotypes (E and J) and these have been sampled for RNA-seq at five time points (00, 01, 05, 48, and 96).
I would like to analyze this experiment as a factorial (which I am sure could be argued), and when I set up the following design:

> design<-model.matrix(~0 + Group + Time + Group*Time),

I get the following design matrix (I've taken out the 3 three replicates at experimental cell.):

> design
      GroupE GroupJ Time01 Time05 Time48 Time96 GroupJ:Time01 GroupJ:Time05 GroupJ:Time48 GroupJ:Time96
E00      1      0      0      0      0      0             0             0             0             0
J00      0      1      0      0      0      0             0             0             0             0
E01      1      0      1      0      0      0             0             0             0             0
J01      0      1      1      0      0      0             1             0             0             0
E05      1      0      0      1      0      0             0             0             0             0
J05      0      1      0      1      0      0             0             1             0             0
E48      1      0      0      0      1      0             0             0             0             0
J48      0      1      0      0      1      0             0             0             1             0
E96      1      0      0      0      0      1             0             0             0             0
J96      0      1      0      0      0      1             0             0             0             1
attr(,"assign")
 [1] 1 1 2 2 2 2 3 3 3 3
attr(,"contrasts")
attr(,"contrasts")$Group
[1] "contr.treatment"

attr(,"contrasts")$Time
[1] "contr.treatment"


I have some questions regarding testing contrasts:
First though, I realize that I can pull out the data for E00 and J00, say, and compare these using exactTest, but then I guess that you are disregarding all information that lies in the remaining data set? That does not seem to me to be a good way to do it. But I may be wrong here?

Second, if I want to do the comparison E00 vs J00 using the design indicated above, how should that be set up? Would

lrt <- glmLRT(fit,contrast=c(1,-1,0,0,0,0,0,0,0,0)) do it?

Third, if I want to compare (E00 + E01 + E05) vs (E48 + E96), how would I set up that contrast?
For my own pedagogic reasons I can show how I was thinking in this case:

E00 = L1
E01 = L1 + L4   x 2  -->  6L1 + 2L3 + 2L4
E05 = L1 + L4

versus                                       --> 2L3 + 2L4 - 3L5 -3L6

E48 = L1 + L5   x 3  -->  6L1 + 3L5 + 3L6
E96 = L1 + L6


resulting in the contrast:

lrt <- glmLRT(fit,contrast=c(0,0,2,2,-3,-3,0,0,0,0))

Then I seem to remember that contrasts should sum up to zero? If anybody could tell me where I am loosing it, I would be very obliged.

Thank you

jd





 -- output of sessionInfo():

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252  LC_CTYPE=Norwegian (Bokmål)_Norway.1252
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.6.8          limma_3.20.9         BiocInstaller_1.14.2

loaded via a namespace (and not attached):
[1] tools_3.1.0
>

--
Sent via the guest posting facility at bioconductor.org<http://bioconductor.org>.

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]



More information about the Bioconductor mailing list