[BioC] edgeR factorial experiment contrasts

Jahn Davik jahn.davik at bioforsk.no
Thu Sep 4 11:16:02 CEST 2014

Hi Jim,
Thanks for clarification. Appreciate it.

For the sake of it I ran a couple of comparisons between the one-way approach that you advocate and the factorial approach. For the simple comparisons, e.g., E00 vs J00, the two approaches returned pretty much the same results.
When I did the comparison ‘average(E00+E01+E05)’ vs ‘average(E48+E96)’ the topTags were pretty much equal. Still, a big fraction (~ 15%) of the significant (FDR<.001) transcripts found in one of the groups, were not found in the other. I suspect that can be explained by including the interactions in the factorial approach and not taking them into account in the one-way approach?

Another thing about doing the factorial approach is that you end up with a contrast statement something like:

lrt <- glmLRT(fit,contrast=c(0,0,2,2,-3,-3,0,0,0,0))

(the design matrix is slightly different from the one you commented upon below) where the coefficients certainly don’t add up to zero. This comes from the ‘algebraic contortions’ as you call them. I got the impression from your correspondence with Julia Pickl on this site that contrast coefficients should sum up to zero? But that may only apply to her situation?

Finally, I would be pleased to hear thoughts on which approach is statistically ‘best’. I’ve been under the impression in statistically analyses that using the entire data set from an experiment is good practice, even if one makes simple pairwise comparison. Does edgeR do that?

Thank you.


Fra: James W. MacDonald [mailto:jmacdon at uw.edu]
Sendt: 3. september 2014 23:38
Til: Jahn Davik
Kopi: Jahn Davik [guest]; bioconductor at r-project.org
Emne: Re: [BioC] edgeR factorial experiment contrasts

Hi Jahn,

No, it's not wrong. You can parameterize any number of ways, and you will get identical results for equivalent contrasts. But given that you will get identical results, and one way is conceptually easier to figure out, I would in general advocate using the conceptually easier parameterization over a more difficult one that requires lots of algebraic contortions in order to figure out what contrast to use.

I believe you sent a truncated version of your experiment (there were no replicates at each time point), so I can't give you explicit code. However, say you have a design matrix that looks something like this:

E00 J00 E01 J01 E05 J05 E48 J48 E96 J96
<zeros and ones go here>

And you want to compare the third contrast you mention below. It's as simple as

glmLRT(fit, contrast = c(0.33, 0, 0.33, 0,0.33, 0,-0.5, 0, -0.5, 0))

Which tests the average of the E0-5 time points against the average of the E48-96 time points.

And interactions are simple as well. Say you want the interaction between times 0 and 96:

glmLRT(fit, contrast = c(1,-1,0,0,0,0,0,0,-1,1))

Simple, no?



On Wed, Sep 3, 2014 at 4:10 PM, Jahn Davik <jahn.davik at bioforsk.no<mailto:jahn.davik at bioforsk.no>> wrote:
Hi Jim,
Yes, I believe I can see that (except for possibly my third question). It's just that I tend to think along these lines. Are they wrong?

Fra: James W. MacDonald [jmacdon at uw.edu<mailto:jmacdon at uw.edu>]
Sendt: 3. september 2014 22:06
Til: Jahn Davik [guest]
Kopi: bioconductor at r-project.org<mailto:bioconductor at r-project.org>; Jahn Davik
Emne: Re: [BioC] edgeR factorial experiment contrasts
Hi Jahn,

Things will go much easier for you if you just parameterize as cell means rather than a conventional factorial model.

grps <- factor(paste(Group, Time, sep = "_"))

design <- model.matrix (~0+grps)

Then all the comparisons are easy to do.



On Wed, Sep 3, 2014 at 12:39 PM, Jahn Davik [guest] <guest at bioconductor.org<mailto:guest at bioconductor.org>> wrote:
I am trying to figure out how to do comparisons using edgeR but have to admit that I am on unsure ice.
My experiment is abiotic stress on two genotypes (E and J) and these have been sampled for RNA-seq at five time points (00, 01, 05, 48, and 96).
I would like to analyze this experiment as a factorial (which I am sure could be argued), and when I set up the following design:

> design<-model.matrix(~0 + Group + Time + Group*Time),

I get the following design matrix (I've taken out the 3 three replicates at experimental cell.):

> design
      GroupE GroupJ Time01 Time05 Time48 Time96 GroupJ:Time01 GroupJ:Time05 GroupJ:Time48 GroupJ:Time96
E00      1      0      0      0      0      0             0             0             0             0
J00      0      1      0      0      0      0             0             0             0             0
E01      1      0      1      0      0      0             0             0             0             0
J01      0      1      1      0      0      0             1             0             0             0
E05      1      0      0      1      0      0             0             0             0             0
J05      0      1      0      1      0      0             0             1             0             0
E48      1      0      0      0      1      0             0             0             0             0
J48      0      1      0      0      1      0             0             0             1             0
E96      1      0      0      0      0      1             0             0             0             0
J96      0      1      0      0      0      1             0             0             0             1
 [1] 1 1 2 2 2 2 3 3 3 3
[1] "contr.treatment"

[1] "contr.treatment"

I have some questions regarding testing contrasts:
First though, I realize that I can pull out the data for E00 and J00, say, and compare these using exactTest, but then I guess that you are disregarding all information that lies in the remaining data set? That does not seem to me to be a good way to do it. But I may be wrong here?

Second, if I want to do the comparison E00 vs J00 using the design indicated above, how should that be set up? Would

lrt <- glmLRT(fit,contrast=c(1,-1,0,0,0,0,0,0,0,0)) do it?

Third, if I want to compare (E00 + E01 + E05) vs (E48 + E96), how would I set up that contrast?
For my own pedagogic reasons I can show how I was thinking in this case:

E00 = L1
E01 = L1 + L4   x 2  -->  6L1 + 2L3 + 2L4
E05 = L1 + L4

versus                                       --> 2L3 + 2L4 - 3L5 -3L6

E48 = L1 + L5   x 3  -->  6L1 + 3L5 + 3L6
E96 = L1 + L6

resulting in the contrast:

lrt <- glmLRT(fit,contrast=c(0,0,2,2,-3,-3,0,0,0,0))

Then I seem to remember that contrasts should sum up to zero? If anybody could tell me where I am loosing it, I would be very obliged.

Thank you


 -- output of sessionInfo():

R version 3.1.0 (2014-04-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)

[1] LC_COLLATE=Norwegian (Bokmål)_Norway.1252  LC_CTYPE=Norwegian (Bokmål)_Norway.1252
[3] LC_MONETARY=Norwegian (Bokmål)_Norway.1252 LC_NUMERIC=C
[5] LC_TIME=Norwegian (Bokmål)_Norway.1252

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.6.8          limma_3.20.9         BiocInstaller_1.14.2

loaded via a namespace (and not attached):
[1] tools_3.1.0

Sent via the guest posting facility at bioconductor.org<http://bioconductor.org>.

Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

James W. MacDonald, M.S.
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

	[[alternative HTML version deleted]]

More information about the Bioconductor mailing list