[BioC] Differential Gene Expression using limma

James W. MacDonald jmacdon at med.umich.edu
Fri Jan 27 20:01:57 CET 2006


Hi Sonia,

Sonia SHAH wrote:
> Hi,
> 
> It would be greatly appreciated if I could get some advice on how to go
> about looking at differential expression on my data.
> 
> I have Affy data from 3 different cell types: type1, type2, type3
> and 3 biological reps for each type
> 
> 
> I want to get 2 gene lists using limma:
> 1. genes that are expressed in type1 and type3 but not in type2
> 2. genes that are expressed in type2 and type3 but not in type1

Just a technical point here; you cannot find genes that are 'expressed' 
in one sample and not in another. The best you can do is find genes that 
are expressed at a different level between samples.

> 
> There seem to be lost of different ways of doing this. I tried 2 design
> matrices:
> 
> DESIGN1
> 		type1	type2	type3
> type1rep1	1	0	0	
> type1rep2	1	0	0
> type1rep3	1	0	0
> type2rep1	0	1	0
> type2rep2	0	1	0
> type2rep3	0	1	0
> type3rep1	0	0	1
> type3rep2	0	0	1
> type3rep3	0	0	1
> 
> contrasts: (type1+type3)-type2
> 	    (type2+type3)-type1

These are not contrasts. To be a contrast, the coefficients have to sum 
to zero, so you would need

(type1 + type3)/2 - type2
(type2 + type3)/2 - type1

> 
> 
> 
> DESIGN2
> I would use 2 design matrices to get each gene list
> 
> The first matrix below will give genes that are in type1+3 but not in
> type2:
> 
> 		A	B
> type1rep1	1	0		
> type1rep2	1	0	
> type1rep3	1	0	
> type2rep1	0	1	
> type2rep2	0	1	
> type2rep3	0	1	
> type3rep1	1	0	
> type3rep2	1	0	
> type3rep3	1	0	
> 
> contrast A-B
> 
> 
> The second matrix below will give genes that are in type2+3 but not in
> type1:
> 
> 		A	B
> type1rep1	0	1		
> type1rep2	0	1	
> type1rep3	0	1	
> type2rep1	1	0	
> type2rep2	1	0
> type2rep3	1	0	
> type3rep1	1	0	
> type3rep2	1	0	
> type3rep3	1	0	
> 
> contrast A-B
> 
> 
> I would have thought that the two different approaches would give me the
> same number of differentially expressed genes. But it doesn't. It gives
> me very different numbers. 
> 
> Are the two approaches the same or am I doing something completely
> wrong?

Well, if you used the contrasts as I outline above they will be very 
similar but still not the same. The difference is a technical point 
about how the contrasts are computed. Note: To make this explanation 
easier to understand, I am omitting the empirical Bayes moderation step.

In the first case, the contrast you are using is very similar to a 
t-statistic, in which you are computing the difference in mean 
expression in the numerator, and an estimate of how accurately you are 
computing those means in the denominator. Since you have three groups, 
the denominator tells you how well you are estimating the mean of those 
three groups (based on the variance within each group - this is the 
important point).

In the second case, the contrast is identical to a t-statistic because 
you have two groups you are comparing and the denominator estimates how 
well you are estimating the means of those two groups.

To illustrate this difference, here is an example.

Let's say that the expression values for a particular gene look like this:

Type1 = 5.6, 5.8, 5.4
Type2 = 8.5, 8.6, 8.3
Type3 = 14.1, 14.2, 14.5

Now in the first case, if you compute the contrast

(type2 + type3)/2 - type1

you will get a difference of ~5.8 and a very significant p-value because 
the variability *within* each sample type is very small.

On the other hand, if you did the comparisons as in your second case, 
this would probably not be significant because the variability within 
the pooled Type2 and Type3 samples would now be quite high. This will 
result in a much larger denominator for your t-statistic (but with the 
same numerator), so the resulting p-value will be much larger.

So how you do things depends on what exactly you are looking to show. If 
you want to find those genes where e.g., Type1 is different from the 
mean expression of Type2 and Type3 then you want to use your first 
method. If you want to find those genes where the expression values for 
Type1 are different from Type2 and Type3 _and_ there is very little 
difference between Type2 and Type3, then you should use your second method.

HTH,

Jim


-- 
James W. MacDonald
Affymetrix and cDNA Microarray Core
University of Michigan Cancer Center
1500 E. Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623



More information about the Bioconductor mailing list