[BioC] Problems making contrasts

James W. MacDonald jmacdon at med.umich.edu
Fri Feb 22 15:36:25 CET 2008

```Hi Ingrid,

It would help here if you showed the contrast matrices you have
produced. You should be getting different numbers of differentially
expressed probesets if the contrast matrices are different.

As for the differences you see with the different design matrices, this
is because you are testing two different hypotheses. In the first case
you are testing to see if the average expression of the stimulated
samples is different from the unstimulated, and the yardstick you are
using to determine if there is a difference is based on the variability
within each group.

In the second design matrix you are again testing to see if the average
expression of the stimulated samples is different from the unstimulated,
but this time the yardstick you are using to determine if there is a
difference is based on the variability within the stimulated and
unstimulated samples, where you are pooling the patients and donors.

So in the first case you are saying that you have four groups, but want
to see if the average expression for two of the groups is different from
the other two. In the second case you are saying you only have two
groups and you want to know if the expression is different between them.

Although the first approach is statistically valid and a common thing to
do, it does suffer from the fact that the mean is not robust to
outliers. For instance, let's say the average of your four groups for a
particular probeset is like this:

H_s	HC_s	T_s	TC_s
10.1	4.3	4.1	3.7

And the SSE from your model is relatively small (this value being based
on the 'average' variability of the samples within each of the four
groups, indicating that the replicates for each group are very similar).

Now in this case you might get a significant t-statistic, because the
numerator of your statistic will be 3.1, and if the SSE is sufficiently
small you will get a large t-stat.

However, if you pool the H_s and T_s samples (and the HC_s and TC_s
samples), the variability for this group will be really high (because
you have three values around 10 and three around 4). Because of this,
the denominator of the t-stat will be much larger and you will likely no
longer achieve significance.

So it depends on what you are looking for. The average expression
between the stimulated and unstimulated groups is certainly different,
but in this case this difference is driven solely by the H_s group.

This may well be why you get far fewer probesets in the second model
than the first. Instead of doing either of these models, you might
to capture those probesets that are significant in both H_s vs HC_s and
T_s vs TC_s, but may not have similar expression levels.

You might also be interested in the interaction, which would pick up the
case that I outlined above, where one sample type is affected
differently from the other when subjected to treatment.

Best,

Jim

Ingrid H. G. Østensen wrote:
> Hi
>
> Now I have tried to use my formula (dividing on 2, 4 and nothing), what James suggested and also made a new design matrix.
>
> When I divided on 2, 4 or nothing, or used James suggestion I got the same results:
>
>> designMa
>    H_s HC_s T_s TC_s
> H    1    0   0    0
> H    1    0   0    0
> H    1    0   0    0
> HC   0    1   0    0
> HC   0    1   0    0
> HC   0    1   0    0
> T    0    0   1    0
> T    0    0   1    0
> T    0    0   1    0
> TC   0    0   0    1
> TC   0    0   0    1
> TC   0    0   0    1
>
>> oppsum
>    H_s - HC_s T_s - TC_s (H_s - HC_s + T_s - TC_s)
> -1        733        874                      1077
> 0       47292      47065                     46631
> 1         676        762                       993
>> oppsum
>    H_s - HC_s T_s - TC_s (H_s - HC_s + T_s - TC_s)/2
> -1        733        874                        1077
> 0       47292      47065                       46631
> 1         676        762                         993
>
>> oppsum1
>    H_s - HC_s T_s - TC_s (H_s - HC_s + T_s - TC_s)/4
> -1        733        874                        1077
> 0       47292      47065                       46631
> 1         676        762                         993
>
>
> But when I made a new matrix:
>
>> designMa
>    s us
> H  1  0
> H  1  0
> H  1  0
> HC 0  1
> HC 0  1
> HC 0  1
> T  1  0
> T  1  0
> T  1  0
> TC 0  1
> TC 0  1
> TC 0  1
>
>
> contrast.matrix <- makeContrasts(s-us, levels = designMa)
>
> I got a different answer:
>
>> oppsum2
>    s - us
> -1      8
> 0   48657
> 1      36
>
>
> My question now is: Why and what is the right solution? And why divide on 2 or 4 (this I read in the limma user guide, section 8.7)
>
> Regards,
> Ingrid
>
> Hi Ingrid,
>
> I haven't used makeContrasts() for a while now, so I'm not sure I can
> help with that. However, it isn't difficult to construct your contrast
> matrix by hand.
>
> nam <- colnames(design)
> contrast <- matrix(c(1,-1,0,0,0,0,1,-1,0.5,-0.5,0.5,-0.5), ncol = 3,
> 		dimnames = list(nam,c(paste(nam[c(1,3)], nam[c(2,4)],
> 		sep = "-"), "Stimulated-Unstimulated")))
>
> You might get the same result by dividing by two in your call to
> makeContrasts() rather than four.
>
> Best,
>
> Jim
>
>
>
>
> Ingrid H. G. Østensen wrote:
>> Hi
>>
>> I have some problems making my contrast matrix.
>>
>> I have the following design matrix:
>> 	P_s	P_us	D_s	D_us
>> S1	1	0	0	0
>> S2	1	0	0	0
>> S3	0	1	0	0
>> S4	0	1	0	0
>> S5	0	0	1	0
>> S6	0	0	1	0
>> S7	0	0	0	1
>> S8	0	0	0	1
>>
>>
>> Where P = patiens and D = donor, s = stimulated and us = unstimulated
>>
>> What I want is to find the following differences:
>> The differences between stimulated and unstimulated in the patients group, and the differences between stimulated and unstimulated in the donor group. This I am able to make, the two first contrasts.
>>
>> But then I also want to see the difference between the two treatmens undepended of samples: stimulated vs unstimulated.
>> In other words: (P_s and D_s) vs (P_us and D_us). Is my last contrast correct or should I do something else?
>>
>> contrast.matrix <-
>> makeContrasts(P_s-P_us, D_s-D_us, (P_s-P_us + D_s-D_us)/4, levels = designMa)
>>
>> Regards,
>> Ingrid
>>
>>
>>
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623

```