[BioC] limma, analysis on subset of data gives completely different results

Gordon K Smyth smyth at wehi.EDU.AU
Sat Jul 19 12:45:57 CEST 2014


Dear Arvid,

As Jim has indicated, you have increased the number of samples you are 
using to estimate the standard errors more than ten-fold (from 2 groups to 
21 groups).  You can now call DE results with more confidence, even though 
the fold changes themselves remain the same.

>From what you say, the results appear to be not "completely different", 
but simply more significant than before, this is hardly surprising given 
the huge increase in residual degrees of freedom.

Best wishes
Gordon

> Date: Thu, 17 Jul 2014 13:59:28 +0000
> From: Arvid Sond?n <arvid.sonden at gu.se>
> To: "bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: [BioC] limma, analysis on subset of data gives completely
> 	different results
>
> Dear all,
>
> I am currently working with gene expression analysis in limma. I have a 
> total of 146 samples divided into 21 groups. What I want to do is 
> pairwise comparisons between one group (the control group) and the 
> others. The following code shows this for the first pairwise comparison 
> between group B and the control group, also adding batch effects to the 
> model. All groups are included in the "Group" variable.
>
> design <- model.matrix(~0+Group+Batch)
> fit<-lmFit(y$E,design)
> cont  <- makeContrasts( " GroupB- GroupControl", levels=design)
> fit <- contrasts.fit(fit, cont)
> fit <- eBayes(fit)
> tt <- topTable(fit, adjust="BH",  coef=" GroupB-GroupControl ", genelist=y$genes, number=Inf)
>
>> From the beginning I was only working with this first comparison, and 
>> was only using the data from group B and the control group. Now I have 
>> extended this to all the data and all the pairwise comparisons. Since I 
>> am using all of the data in the lmFit function the fit is different 
>> from before when I was only using a part of the data. What makes me 
>> confused is that the difference is quite large. Now I have 1378 
>> significant genes compared to 203 before for the GroupB-GroupControl 
>> comparison after the BH correction.
>
> Is there a possible limma specific explanation for this? I have read the 
> documentation on the functions, and the limma user's guide, but I can't 
> say that I have fully understood what is going on inside the lmFit 
> function. On a more conceptual level I understand that the linear model 
> will change when I add new data and new variables, but it seems to be a 
> too large change in my eyes since the actual comparison is still the 
> same.
>
> Best regards,
> Arvid

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list