[BioC] Odd contrast; does it make statistical sense?

Sun Jan 26 04:59:50 CET 2014

Thank you for the comprehensive analysis, Gordon. I will think 
carefully about the precise question I want to answer and choose the 
appropriate example from your post.

-Ryan

On Sat Jan 25 18:15:26 2014, Gordon K Smyth wrote:
> Dear Ryan and Aaron,
>
> Given Aaron's reactions to my previous responses, I will make one more
> attempt to answer in slightly more detail.
>
> The first thing to appreciate is that every statistical test is an
> answer to a particular question.  The contrast test that you mention
> certainly makes statistical sense, but this is not the issue.  The
> issue is scientific rather than statistical.  Whether or not this test
> is an appropriate answer to your scientific question depends on what
> your scientific question is.  You have not yet laid this out in
> sufficient detail.
>
> Here are some different scientific contexts that might or might not
> apply in your situation.
>
> First, you might want to assert that C and D have higher expression
> than either A or B.  If you want to claim that, then clearly you must
> do individual contrasts C vs A, C vs B, D vs A and D vs B.  There is
> no shortcut.  The contrast C+D vs A+B is not sufficient.
>
> Or you might want to assert that the treatments cluster into two big
> groups, C and D vs A and B.  Do establish this, you need to show that
> the CD vs AB separation is larger compared to CvsD and BvsA.  You
> could do all pairwise comparisons, but a slighly more efficient method
> would be to test three contrasts B-A, D-C and (C+D)/2-(A+B)/2.  You
> can make this assertion if the third contrast is far more significant
> than the first two.  Even if B-A and D-C are statistically
> significant, you could still establish the claim by showing that the
> fold changes for (C+D)/2-(A+B)/2 are much larger than those for B-A or
> D-C.
>
> Or you might want to assert that a population made up of equal parts C
> & D would have different expression to a population made of equal
> parts of A & B.  To assert that, you only need to test (C+D)/2-(A+B)/2.
>
> The four groups might arise from two original factors.  Suppose that
> the groups A--D correspond to factors are Big = c(1,1,2,2) and Sub =
> c(1,2,1,2).  You might want to assert that Big high increases
> expression over Big low regardless of the level of Sub.  In that case
> you need to test the two contrasts C-A and D-B.  If both are
> significantly up, then you can make the assertion.
>
> Or you might want to assert that Big has the same effect on expression
> regardless of the Sub baseline.  In that case you need to show that
> (C+D)/2-(A+B)/2 is significant but (D-B)-(C-A) is not.
>
> Finally, if you were confident in advance that A and B were not
> different and C and D were not different, then you could simply pool
> the A and B samples together and the C and D samples together and do a
> two group test. This produces a statistically valid test only if there
> is no systematic differential expression between A and B or between C
> and C.  But if you knew that in advance, why did you classify the
> samples into four groups in the first place??
>
> Best wishes
> Gordon
>
>
>>> Date: Wed, 22 Jan 2014 16:17:35 -0800
>>> From: "Ryan C. Thompson" <rct at thompsonclan.org>
>>> To: bioconductor <Bioconductor at r-project.org>
>>> Subject: [BioC] Odd contrast; does it make statistical sense?
>>>
>>> Hi all,
>>>
>>> I'm currently using edgeR to test a somewhat odd contrast. Basically, I
>>> have multiple groups, and I want to combine them into just 2 big groups
>>> and test whether the two big groups have significantly different
>>> averages. I'll give a toy example that demonstrates the same
>>> concept. In
>>> this example, there are 4 groups, A through D, each containing 3
>>> samples, and I want to test whether the mean of all samples in A & B is
>>> different from the mean of all samples in C & D:
>>>
>>> group <- rep(LETTERS[1:4], 3)
>>> design <- model.matrix(~0+group)
>>> colnames(design) <- LETTERS[1:4]
>>> cont <- makeContrasts((A+B)/2 - (C+D)/2, levels=design)
>>>
>>> My worry is that with this contrast, I'm effectively just testing
>>> two groups against each other, and by having 4 groups in the design
>>> I will be estimating dispersions that are not appropriate for the
>>> test that I'm doing, and hence I will overstate my confidence.
>>>
>>> Or, to put it another way, am I doing something equivalent to
>>> testing a main effect in a model where an interaction term is present?
>>>
>>> Thank you,
>>>
>>> -Ryan Thompson
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}