[BioC] limma, analysis on subset of data gives completely different results

Gordon K Smyth smyth at wehi.EDU.AU
Tue Jul 22 04:14:11 CEST 2014


> Date: Sun, 20 Jul 2014 11:33:10 -0400
> From: Marcin Cie?lik <marcin.cieslik at gmail.com>
> To: Aaron Mackey <ajmackey at gmail.com>,	"bioconductor at r-project.org"
> 	<bioconductor at r-project.org>
> Subject: Re: [BioC] limma, analysis on subset of data gives completely
> 	different results
>
> Dear All,
>
> The dark side of this notion is that if your groupB happens to have higher
>>  true variance than the other groups (i.e. heteroscedastic), then by
>> including the other groups you've shrunken the variance too much, and
>> inflated significance.
>
>
> This appears to be an issue not only with "lmFit", but also with camera.
> Although I have observed that including additional data increased p-values
> for the genes and gene sets, respectively (for all the groups).
>
> Marcin

As Jim had already pointed out (in comments included in Aaron's post but 
deleted from Marcin's), the assumption of equal variances is a property of 
linear models and anova in general, not specific to limma let alone to 
lmFit.

If you consider that groups have substantially difference variances, then 
limma provides functions arrayWeights() and voomaByGroup() to deal with 
this.

However the linear regression methods are not particulary sensitive to 
unequal variances, and results usually become more conservative in this 
situation rather than the other way around.  You would only try to fix if 
the problem is substantial or there are plenty of replicates in each 
group.

Best wishes
Gordon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list