# [BioC] edgeR contrast

Pickl, Julia j.pickl at dkfz-heidelberg.de
Thu Sep 4 09:52:51 CEST 2014

Thank you very much for that very good explanation!!!
Best wishes
Julia

Von: James W. MacDonald [mailto:jmacdon at uw.edu]
Gesendet: Mittwoch, 3. September 2014 16:44
An: Pickl, Julia
Cc: bioconductor at r-project.org
Betreff: Re: [BioC] edgeR contrast

Hi Julia,

On Wed, Sep 3, 2014 at 2:18 AM, Pickl, Julia <j.pickl at dkfz-heidelberg.de<mailto:j.pickl at dkfz-heidelberg.de>> wrote:
Hi Jim,
could you please tell me, why contrast 3 and 4 are not valid contrasts? I do not understand it completely. Is it because the same amount of factors should have +1 and -1 in the contrast matrix?

They aren't valid contrasts because the coefficients don't add up to zero. You use the contrast to form a t-statistic, which you then use to test a null hypothesis versus an alternative hypothesis.

In general, the null hypothesis is that the numerator of the t-statistic is equal to zero, and the alternative hypothesis is that the numerator is not equal to zero (depending on the alternative you can also test that the numerator is greater or less than zero). Because of this, the coefficients of the contrast have to add up to zero (or else you aren't testing the null that the numerator equals zero).

So if we look at your contrast 3, you have

IP.treat - IP.control - IgG.treat

Now remember, ANOVA is simply algebra. You could be hypothesizing that IP.treat - IP.control - IgG.treat = 0, which would imply that IP.control and IgG.treat somehow sum to be equal to IP.treat. But that is a weird sort of null hypothesis (and from a biological perspective, why would you think that would be true?). One would usually assume that under the null, there are no differences between any of those three groups. In which case this contrast would be testing that IP.treat - IP.control - IgG.treat = -1, which is certainly a valid thing to test, I suppose, but what would it mean to reject that null hypothesis? There are any number of ways that those three coefficients could add up to something different from -1, so it isn't clear what you are testing here.

From a biological point of view I have still problems with the contrast 1
(IP.treat-IgG.treat)-(IP.control-IgG.control),
as it is also
IP.treat – IgG.treat – IP.control + IgG.control
And this looks like the counts of IP.treat plus IgG.control are compared to IgG.treat and IP.control.

And that is another interpretation for that contrast. This is why the associative law is useful; you can move things around in such a way to make interpretation of the result easier (or harder, if you so desire).

There are two things to consider. First, you want to set up both your coefficients and any contrasts in such a way that you can most easily interpret the results. In this case, setting up the contrast as (and thinking of the contrast in terms of) (IP.treat - IP.control)-(IgG.treat - IgG.control) is easiest.

This is because you can then formulate the null hypothesis as

firstpart - secondpart = 0

or alternatively

firstpart = secondpart

which means that the difference between IP.treat and IP.control is equal to the difference between IgG.treat and IgG.control, which you can then interpret as meaning that the IP results are indistinguishable from the IgG results. And since that is a useful null hypothesis, given the experiment, it is best to interpret the contrast that way.

The second issue has to do with rejecting the null hypothesis, and what that means. For a simple contrast, interpreting a rejected null hypothesis is simple. Say you tested

IP.treat - IP.control

and you reject the null with a p < 0.05, and the t-statistic has a value of 13.4. It's easy then to say that there appears to be a difference between those two samples, and it is also easy to see that the treatment results in way more of the given gene being pulled down by the IP step (because the t-statistic has a positive sign, implying a positive fold change, which can only come about if the IP.treat coefficient is larger than the IP.control coefficient).

But if you get a p < 0.05 and a t-statistic of 13.4 for the interaction term (IP.treat - IP.control - IgG.treat + IgG.control), then how do you interpret that result? With just the t-statistic (or even the log fold change) all you can say is that there is a difference between treatment and control that is dependent on whether or not you used the IP antibody or non-specific IgG. But this result can arise in any number of ways, and you need to explore the data further to see exactly what is going on, by e.g. plotting the logCPM values by group.

Best,

Jim

Best wishes,
Julia

Von: James W. MacDonald [mailto:jmacdon at uw.edu<mailto:jmacdon at uw.edu>]
Gesendet: Dienstag, 2. September 2014 16:35
An: Julia [guest]
Cc: bioconductor at r-project.org<mailto:bioconductor at r-project.org>; Pickl, Julia
Betreff: Re: [BioC] edgeR contrast

Hi Julia,

This appears to be a ChIP-Seq experiment, in which case I wouldn't analyze it this way. Instead, I would use something like MACS to call peaks, using the IgG fractions as the 'input' fraction. In other words, the IgG fraction is used to help distinguish real IP regions from those regions that have high sequencing depth due to technical factors. You would then use edgeR to compare IP.treat versus IP.control. This is not a trivial analysis, and you should look at this paper (http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066778/) for more information on how you should normalize your counts. But maybe I completely misunderstand the experiment.

In that case, back to your question. contrasts 3 and 4 aren't valid contrasts, so you should just ignore those results. Contrast 1 is an interaction contrast, and is testing for genes that have different amounts of IP binding between treatment and control, after adjusting for non-specific binding. If this isn't ChIP-Seq, but instead is some transcript binding experiment, then this is likely the contrast you want.

Best,

Jim

On Tue, Sep 2, 2014 at 6:42 AM, Julia [guest] <guest at bioconductor.org<mailto:guest at bioconductor.org>> wrote:
I try different contrasts with edgeR to get a feeling for my data and also to find out the best contrast for my question. I would like to know what genes are enriched in IP.treat compared to IP.control, both adjusted for unspecific IgG binding.
So it seems like contrast 1 is the best: (IP.treat-IgG.treat)-(IP.control-IgG.control), however it seems like IgG.control is added to IP.treat as â€“ and â€“ is +. I then tried contrast 3 and 4, and get totally different results with genes only FC>1.
My question: Is it allowed to have more levels -1 than +1 or how can it be explained that contrast3 and 4 look very similar but totally different than contrast1 (and IP)?

Levels           IP contrast1 contrast2 contrast3 contrast4 contrast5

IP.treat        1         1         1         1         1         0

IP.control     -1        -1        -1        -1        -1         0

IgG.treat       0        -1         0        -1        -1         1

IgG.control     0         1         0        -1         0        -1

Thanks for any help.

Julia

-- output of sessionInfo():

.

--
Sent via the guest posting facility at bioconductor.org<http://bioconductor.org>.

_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org<mailto:Bioconductor at r-project.org>
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099

[[alternative HTML version deleted]]