[BioC] GCRMA: low intensity exprs estimates / pval distributions

Matthew Hannah Hannah at mpimp-golm.mpg.de
Mon Mar 14 15:25:47 CET 2005


I noticed this a while ago but with some of the recent threads, maybe
now is suitable for a general discussion.

This will be easiest if you view the attached files on the bioconductor
archive site.

Basically GCRMA changed it's BG parameter estimation from using a low
quantile of strata of affinity levels (1.1.0 or less) to a smoother way
using loess. There is also a fast=FALSE option which does not use the
(default) faster ad-hoc algorithm (MLE vs. EB?).

If you compare v1.1.0 and 1.1.3 (current stable release) (+/- fast=F)
there are significant differences in the expression estimates,
particularly at the low end. This is not really too surprising as the
data is noisy and each measure will have its own specifics. What is more
interesting are changes in expression. I looked at a simple 3 vs. 3
comparison (limma, ebayes) within a larger normalized dataset (~50
arrays) and as you can see high p-values are over-represented when the
default(fast=T) version is used. To me this questions whether the
statistical test would still be valid, also it raises questions about
estimating true/false -/+tives.

I think (quick bioC search but no documentation) that a step-up FDR is
used within p.adjust (used in limma). Could such a distribution affect
the validity of using FDR correction. Or is this the p.value equivalent
of having positive dependency of the test statistics?

This all results from the different intensity distributions from GCRMA.
All are bimodal which is likely to result from the genes that are not
present giving the peak at lower intensities. I guess that these absent
genes are responsible for the over-representation of high p-values as
these genes are just BG. However, I prefer to work with the fast=F
version due to their more conventional p-value distributions.

As a thought - I assume a peak area extraction of the lower peak might
be a nice way of detecting the number of 'present' genes.

Any comments?



-------------- next part --------------
A non-text attachment was scrubbed...
Name: GCRMA_comparison.png
Type: image/png
Size: 10346 bytes
Desc: GCRMA_comparison.png
Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20050314/1ce953b7/GCRMA_comparison.png
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GCRMA_comparison_Limma.pvals.png
Type: image/png
Size: 9072 bytes
Desc: GCRMA_comparison_Limma.pvals.png
Url : https://stat.ethz.ch/pipermail/bioconductor/attachments/20050314/1ce953b7/GCRMA_comparison_Limma.pvals.png

More information about the Bioconductor mailing list