[BioC] limma small vs large number of samples

Mon May 5 18:59:46 CEST 2014

Hi Giovanni,

The eBayes step doesn't adjust the variance across genes. Instead, it 
estimates a variance hyperparameter based on the variance across all 
genes, and then 'squeezes' the observed by-gene variances towards that 
hyperparameter.

The degree of shrinkage is dependent on the df for the observed and 
prior variances, and since you have lots of observations, it will be 
less than would be expected for a smaller experiment. You can always 
call the squeezeVar function directly and compare the before and after 
variance estimates if you want.

Best,

Jim

On 5/5/2014 12:26 PM, Giovanni Bucci wrote:
> Thank you, Jim.
>
> One more question. The eBayes step adjusts the variance across genes.
> Since there is already a good estimate for the variance due to the
> large number of samples does the eBayes step shrink the variance even
> further?
>
> Thank you,
>
> Giovanni
>
>
>
> Hi Giovanni,
>
> On 4/28/2014 8:54 PM, Giovanni Bucci wrote:
>> * Hello everybody,
> *>>* I have 32 samples, 4 factors with 2 levels each. Each level has 2
> *>* replicates.
> *>>>* str(gxexprs)
> *>*   num [1:15584, 1:32] 7.94 6.67 9.93 9.62 12.19 ...
> *>>>* Group
> *>*   [1] R52VQ R52VQ R52VE R52VE R52EQ R52EQ R52EE R52EE R95VQ R95VQ
> R95VQ R95VE
> *>* [13] R95VE R95VE R95EQ R95EQ R95EQ R95EE R95EE R95EE R97VQ R97VQ R97VQ R97VE
> *>* [25] R97VE R97VE R97EQ R97EQ R97EQ R97EE R97EE R97EE
> *>* 16 Levels: R52VQ R52VE R52EQ R52EE R95VQ R95VE R95EQ R95EE R97VQ ... R97EE
> *>>>* design <- model.matrix(~0+Group)
> *>* fit <- lmFit(gxexprs, design)
> *>>* contrast.matrix <- makeContrasts(contrasts="R52VQ - R52VE",levels=design)
> *>* fit2 <- contrasts.fit(fit, contrast.matrix)
> *>* fit2 <- eBayes(fit2)
> *>>* TTable = topTable(fit2)
> *>>* global_p_val = TTable$P.Val
> *>>>* gxexprs = gxexprs[, 1:4]
> *>>>* ## same code as above but the expression matrix has only the first 4
> *>* columns which represent the contrast tested above
> *>>* design <- model.matrix(~0+Group)
> *>* fit <- lmFit(gxexprs, design)
> *>>* contrast.matrix <- makeContrasts(contrasts="R52VQ - R52VE",levels=design)
> *>* fit2 <- contrasts.fit(fit, contrast.matrix)
> *>* fit2 <- eBayes(fit2)
> *>>* TTable = topTable(fit2)
> *>>* local_p_val = TTable$P.Val
> *>>* local_p_val has much greater values than global_p_val even though they
> *>* represent the same comparison.
> *>>* What is the explanation for this?
> *
> The denominator of your t-statistic is based on the mean square error of
> the model (which is based on the intra-group variance of all groups).
> When you have all the other groups in the model, the number of
> observations used to estimate variance is larger, so you get more
> degrees of freedom for you test (and the variance estimate is more
> accurate), so you get smaller p-values in general.
>
> Best,
>
> Jim
>
>
>>> * Can you point to some diagnostic functions that will show the difference?
> *>>* Thank you,
> *>>* Giovanni
> *>>* 	[[alternative HTML version deleted]]*
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099