[BioC] help: limma and changing gene results!

Wed May 19 15:39:24 CEST 2010

Hi Koen,

Koen Marien wrote:
>>>>> Dear Jim and others who can help

If possible, it is preferable for your response to _not_ be preceded by 
 >>>>>, as that looks to most like what was written five responses ago, 
rather than being the current portion of the email.

> 
> Koen Marien wrote:
>> Thanks for the clear and fast reply, Jim. Indeed a-(b+c+d) isn't a
> contrast,
>> but I think I'm having a different problem. Here is the experiment shortly
>> explained:
> 
> Yes, but... a-(b+c+d) doesn't make any sense. Why would you do such a 
> thing? Let's say the mean of all four samples for a given gene is 
> identical (I dunno, say 5).
> Any of a-b, a-c, a-d will be zero, whereas a-(b+c+d) is -10. So what 
> does that tell us, in a biological sense?
> 
>>>>> I compare a progenitor population with three offspring populations to
> identify surface markers. So I need upregulated genes in the 'a' population 
>>>>> compared to 'b', 'c' and 'd' populations

OK, fine. But you seem to be missing the fact that we are just doing 
simple math here. If you want to compare the 'a' population to b-d, then 
the only reasonable way to do that is to use the mean of the b-d 
populations. That is why I say you aren't doing a contrast. For a 
comparison to be a contrast, the coefficients have to sum to zero, so 
what you want is a - (b + c + d)/3.

> 
> 
>>  
>> I have four populations of cells with three biological replicates for each
>> population -> a1,a2,a3,b1,b2,b3,c1,c2,c3,d1,d2,d3. I normalized them and
>> looked at the differentially expressed genes between the 'a' population
> and
>> each of those other populations individually: a-b, a-c, a-d. The venn
>> approach is done with the online web application Venny and only looks at
> the
>> common probe set ID's in the three lists (let's call it the 'one-on-one
>> strategy').
>> I also looked at the differentially expressed genes when b, c and d values
>> where put together: a-e with e=b+c+d (let's call it the 'group strategy').
>> So it's not really the contrasts that are changed.
> 
> How are the contrasts not changed? You are comparing a contrast with a 
> not-a-contrast that doesn't even make sense. That there will be 
> differences is a forgone conclusion.
> 
>>>>> I don't really change the contrast (look at the code, it's always
> 'group2-group1')
>>>>> I'll try to explain again: 
>>>>> one-on-one strategy: compared a to b, a to c, a to d and compared the
> differentially expressed genes with the online Venny-tool
> (http://bioinfogp.cnb.csic.es/tools/venny/index.html). So e.g. group1 = 'a'
> population (always) and group 2 = 'b' or 'c' or 'd' (I ran the code three 
>>>>> times)
> 
>>>>> group strategy: compared a to (b&c&d) (look at the code: I annotated
> the 'a' files by appointing them to population '1' and the 'b','c','d' files
> by 
>>>>> appointing >> them to population '2') so group1 = 'a' population and
> group2 = 'b'+'c'+'d'

Right. And that is what doesn't make sense. You can set group 2 to be 
(b+c+d)/3, and then compare that to a. This is similar to individually 
comparing b, c, and d to a, except you are 'smoothing' the values for 
the offspring samples by taking the mean, so you will likely still get 
differences, depending on the underlying data.

> 
>>>>> My questions are: Why do I get different lists in these two approaches?
> Which approach gives me the best results when I look for specifically 
>>>>> upregulated genes in the 'a' population?

Which approach is 'best' depends on the hypothesis you are trying to 
test. And it may still be impossible to say which is best, which is a 
fairly imprecise term. In my opinion it is more defensible to describe 
the analysis in terms of the hypothesis being tested, and why the 
particular model or contrast you used answers the underlying question.

Best,

Jim

> 
>>>>> I'm still learning and especially learn a lot from you, so thanks for
> your patience, Koen
> 
> 
> Best,
> 
> Jim
> 
> 
>> Now, when looking at the one-on-one strategy list there are only five
> genes
>> common in the three groups with a B-value > 2, while in the group strategy
>> there are 181 probe sets with a B-value > 2.
>>
>> Relevent code used:
>> read all the cell files (a,b,c,d)
>>
> pd<-data.frame(population=c(rep(1,3),rep(2,8)),replicate=c(seq(1,3),seq(1,8)
>> )) => group strategy
>> or
>> only read the .cel files of two populations (a,b or a,c or a,d)
>>
> pd<-data.frame(population=c(rep(1,3),rep(2,3)),replicate=c(seq(1,3),seq(1,3)
>> )) => one-on-one strategy (repeated three times for each comparison)
>>
>> group<-factor(eset$population)
>> design = model.matrix(~0+group)
>> design
>> cont.matrix = makeContrasts(eset = (group2 - group1), levels = design)
>> cont.matrix
>>
>>
>> Regards
>>
>> Koen
>>
>> -----Original Message-----
>> From: James MacDonald [mailto:jmacdon at med.umich.edu] 
>> Sent: woensdag 12 mei 2010 4:40
>> To: Koen Marien
>> Cc: 'Joseph Skaf'; bioconductor at stat.math.ethz.ch
>> Subject: Re: [BioC] help: limma and changing gene results!
>>
>> Hi Koen,
>>
>> Koen Marien wrote:
>>> Dear
>>>
>>>
>>> Is this also the reason why there is a difference in the (differentially
>>> expressed) gene lists of a-(b+c+d) and venny(a-b,a-c,a-d)?
>> I am not familiar with the venny() function, so it's hard to say. But if 
>> you mean a contrast of a-(b+c+d) versus individual contrasts of a-b, 
>> a-c, a-d, then no.
>>
>> In the first place, a-(b+c+d) isn't a contrast, and in most cases 
>> doesn't make sense. You might mean a-(b+c+d)/3, which is a contrast, and 
>> tests the difference between the a group and the mean of the other 
>> three. The denominator will be the same in each case, being based on (in 
>> simple terms) the average variability of the four groups.
>>
>> However, if what I am assuming is correct, then the two contrasts are 
>> quite different, and shouldn't be expected to result in the same gene 
>> lists. As an example, say the mean of the groups for one gene are:
>>
>> a = 5
>> b = 2
>> c = 5
>> d = 8
>>
>> since the denominator will be the same we can ignore that here. So do 
>> you think there will be a difference in what is called significant when 
>> we compare
>>
>> 5 - (2+5+8)/3 = 0
>>
>> and
>>
>> 5 - 2 = 3
>> 5 - 5 = 0
>> 5 - 8 = -3
>>
>> ?
>>
>> Best,
>>
>> Jim
>>
>>
>>> a-(b+c+d): 				putting the b, c and d values in one
>>> group (b+c+d) and using limma
>>> venny(a-b,a-c,a-d): 		using limma on the separate groups and
>>> create a list by looking at the intersection of de venn diagram of the
>> three
>>> 					'sublists' a-b, a-c, a-d
>>>
>>>
>>> Thanks a lot
>>>
>>>
>>> Koen Marien
>>> student bioscience engineering: cell and gene biotechnology
>>> University of Ghent, Belgium
>>>
>>>
>>> -----Original Message-----
>>> From: bioconductor-bounces at stat.math.ethz.ch
>>> [mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of James W.
>>> MacDonald
>>> Sent: donderdag 29 april 2010 18:46
>>> To: Joseph Skaf
>>> Cc: bioconductor at stat.math.ethz.ch
>>> Subject: Re: [BioC] help: limma and changing gene results!
>>>
>>> Hi Joseph,
>>>
>>> Joseph Skaf wrote:
>>>> To whom it may concern,
>>>>
>>>> I've been having some problems with consistency in my limma results for
>>>> genes that are found to have significant differential transcript
>>> abundance.
>>>> In a given example, I may have 4 different groups (a, b, c, and d) in an
>>>> array set of 12.
>>>>
>>>> From here, I make a contrast matrix that has contrasts for a-b, a-c, and
>>>> a-d.  Eventually, I output an eBaye's corrected contrast fit and I use
>>>> decideTests from there to find out what genes are differentially
>>> expressed.
>>>> My misunderstanding is that when I take away an entire group (such as
>>>> removing all d's) and redo all steps in the limma analysis, I find that
> I
>>>> end up with a different set of genes after using decideTests.  I am
>>> confused
>>>> here, because I would not think that removing group 'd' from the
> analysis
>>>> would have an effect on contrasts a-b and a-c.
>>>>
>>>> If anyone could even hint to me a reason as to why this is happening, it
>>>> would be greatly appreciated.
>>> It's because of how the denominator for your contrast is computed. The 
>>> denominator is computed using the intra-group variance for all the 
>>> groups in your study, not just the two groups being compared in the 
>>> contrast.
>>>
>>> So if you remove one of the groups, you lose both degrees of freedom as 
>>> well as the contribution from the intra-group variance of that group. 
>>> Losing the degrees of freedom will reduce your power to detect 
>>> differences. Losing the contribution of the intra-group variance will 
>>> depend on how variable the group d data are compared to groups a-c.
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>> Thanks and regards,
>>>> Joseph Skaf
>>>>
>>>>
>>>>
> 

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues