[BioC] qvalues, sam, limma
naomi at stat.psu.edu
Thu Jun 10 17:06:44 CEST 2004
Of course, qvalue uses the p-values generated by another routine (in my
case, by limma) and SAM generates the q-values from the test
statistic. However, just for fun, I made the following changes to qvalue:
> spi0 <- smooth.spline(lambda, pi0, df = 3,w=1-lambda)
> pi0 <- predict(spi0, x = 1)$y ...
SAM (excel version) estimates pi0=0.17.
qvalue (vanilla) estimates pi0=0.07.
qvalue (revised) estimates pi0=0.066
and the q-values are almost identical.
So, I would say that the differences between SAM and limma have to do with
the estimated p-values. While Gordon suggested that the distribution may
not be the proposed F-distribution, I suspect that the problem is the level
of differential expression, which is probably over 70% (although possibly
not as high as 80-90% as suggested by the analysis). It is difficult to
establish the null distribution from the data when pi0 is very small.
At 12:01 PM 6/10/2004 +1200, Marcus Davy wrote:
>I can make an observation about the differences in pi_0 estimation
>the packages siggenes (SAM analysis) and qvalue.
> > packageDescription("qvalue", field="Version")
> > packageDescription("siggenes", field="Version")
>The package siggenes uses the function "p0.est" which has the following
>cubic spline code:
> spline.out <- smooth.spline(vec.lambda, vec.p0, w = 1 - vec.lambda,
> df = 3)
> p0 <- min(predict(spline.out, 1)$y, 1).
>The package qvalue uses the function "qvalue" which has the following
>cubic spline code:
> spi0 <- smooth.spline(lambda, pi0, df = 3)
> pi0 <- predict(spi0, x = max(lambda))$y
>The estimate pi_0(lambda) over a range of lambda observations becomes
>unbiased as lambda -> 1, with a bias variance tradeoff.
>The real difference is that observations are not weighted by 1-lambda
>in the package qvalue, and the estimate of pi_0 is at pi_0(lambda=0.95)
>by default in the arguements (lambda = seq(0, 0.95, 0.05)).
>In the package siggenes, pi_0 is estimated at pi_0(lambda=1).
>By weighting the observations with 1-lambda assumes values with small
>lambda are trusted to be more accurate.
>Details of the weighting of 1-lambda is in the preprint on page 13 of:
>Storey J. D. and Tibshirani R. J. Statistical significance for
>experiments. Preprint, January 2003.
>I have run simulations that suggest that including the 1-lambda weights
>can improve the variance in the estimation of pi_0(lambda) over the
>range of pi_0.
> >>> Naomi Altman <naomi at stat.psu.edu> 10/06/2004 9:16:00 AM >>>
>I have an Affy experiment with a very high level of differential
>expression. It is a one-way ANOVA with 6 treatments, 2 replicates per
>We ran both SAM (excel version) and limma, and had very good agreement
>between them in terms of ranking the genes by the test statistic. For
>set of the top K genes, over 90% of the genes were identified by both
>SAM automatically produces a q-values and estimates FDR and pi_0 (the
>percentage of non-differentially expressing genes). I used the
>Bioconductor package "qvalue" to convert the limma p-values to
>q-values. Both routines are supposed to be based on the same paper.
>the SAM q-value for the most highly differentially expressed gene is
>whereas the q-value from "qvalue" is 3.9e-12. The SAM q-value for the
>1000th most highly differentially expressed gene is also .0039, but the
>value from "qvalue" is 5.6e-10.
>As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are
>pretty big - e.g. p=0.12. Partly this is because the estimated pi_0 is
>just 7%. By contrast, SAM estimates pi_0 to be 17% and returns a much
>smaller list of genes at the same FDR. These genes have unadjusted
>p-values which are quite small.
>I guess if I believe SAM, I should be getting about 83% of my genes
>declared statistically significant - which, interestingly enough is
>what I do get at FDR=.01 from "qvalue".
>As always, I welcome the insights of the members of this list.
>Naomi S. Altman 814-865-3791 (voice)
>Bioinformatics Consulting Center
>Dept. of Statistics 814-863-7114 (fax)
>Penn State University 814-865-1348
>University Park, PA 16802-2111
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>The contents of this e-mail are privileged and/or confidential to the
>named recipient and are not to be used by any other person and/or
>organisation. If you have received this e-mail in error, please notify
>the sender and delete all material pertaining to this e-mail.
Naomi S. Altman 814-865-3791 (voice)
Bioinformatics Consulting Center
Dept. of Statistics 814-863-7114 (fax)
Penn State University 814-865-1348 (Statistics)
University Park, PA 16802-2111
More information about the Bioconductor