[BioC] limma, analysis on subset of data gives completely different results
Gordon K Smyth
smyth at wehi.EDU.AU
Tue Jul 22 04:14:11 CEST 2014
> Date: Sun, 20 Jul 2014 11:33:10 -0400
> From: Marcin Cie?lik <marcin.cieslik at gmail.com>
> To: Aaron Mackey <ajmackey at gmail.com>, "bioconductor at r-project.org"
> <bioconductor at r-project.org>
> Subject: Re: [BioC] limma, analysis on subset of data gives completely
> different results
>
> Dear All,
>
> The dark side of this notion is that if your groupB happens to have higher
>> true variance than the other groups (i.e. heteroscedastic), then by
>> including the other groups you've shrunken the variance too much, and
>> inflated significance.
>
>
> This appears to be an issue not only with "lmFit", but also with camera.
> Although I have observed that including additional data increased p-values
> for the genes and gene sets, respectively (for all the groups).
>
> Marcin
As Jim had already pointed out (in comments included in Aaron's post but
deleted from Marcin's), the assumption of equal variances is a property of
linear models and anova in general, not specific to limma let alone to
lmFit.
If you consider that groups have substantially difference variances, then
limma provides functions arrayWeights() and voomaByGroup() to deal with
this.
However the linear regression methods are not particulary sensitive to
unequal variances, and results usually become more conservative in this
situation rather than the other way around. You would only try to fix if
the problem is substantial or there are plenty of replicates in each
group.
Best wishes
Gordon
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list