[BioC] qvalues, sam, limma

Thu Jun 10 17:06:44 CEST 2004

Of course, qvalue uses the p-values generated by another routine (in my 
case, by limma) and SAM generates the q-values from the test 
statistic.  However, just for fun, I made the following changes to qvalue:

>  spi0 <- smooth.spline(lambda, pi0, df = 3,w=1-lambda)
>  pi0 <- predict(spi0, x = 1)$y ...

SAM (excel version) estimates pi0=0.17.
qvalue (vanilla)         estimates pi0=0.07.
qvalue (revised)        estimates pi0=0.066

and the q-values are almost identical.

So, I would say that the differences between SAM and limma have to do with 
the estimated p-values.  While Gordon suggested that the distribution may 
not be the proposed F-distribution, I suspect that the problem is the level 
of differential expression, which is probably over 70% (although possibly 
not as high as 80-90% as suggested by the analysis).  It is difficult to 
establish the null distribution from the data when pi0 is very small.

--Naomi

At 12:01 PM 6/10/2004 +1200, Marcus Davy wrote:

>Hi,
>I can make an observation about the differences in pi_0 estimation
>between
>the packages siggenes (SAM analysis) and qvalue.
>
> > packageDescription("qvalue", field="Version")
>[1] "1.0.3"
> > packageDescription("siggenes", field="Version")
>[1] "1.0.6"
> >
>
>The package siggenes uses the function "p0.est" which has the following
>
>cubic spline code:
>...
>     spline.out <- smooth.spline(vec.lambda, vec.p0, w = 1 - vec.lambda,
>
>         df = 3)
>     p0 <- min(predict(spline.out, 1)$y, 1).
>...
>The package qvalue uses the function "qvalue" which has the following
>cubic spline code:
>...
>  spi0 <- smooth.spline(lambda, pi0, df = 3)
>  pi0 <- predict(spi0, x = max(lambda))$y
>...
>
>The estimate pi_0(lambda) over a range of lambda observations becomes
>unbiased as lambda -> 1, with a bias variance tradeoff.
>
>The real difference is that observations are not weighted by 1-lambda
>in the package qvalue, and the estimate of pi_0 is at pi_0(lambda=0.95)
>
>by default in the arguements (lambda = seq(0, 0.95, 0.05)).
>In the package siggenes, pi_0 is estimated at pi_0(lambda=1).
>
>By weighting the observations with 1-lambda assumes values with small
>lambda are trusted to be more accurate.
>
>Details of the weighting of 1-lambda is in the preprint on page 13 of:
>
>Storey J. D. and Tibshirani R. J. Statistical significance for
>genome-wide
>experiments. Preprint, January 2003.
>
>I have run simulations that suggest that including the 1-lambda weights
>
>can improve the variance in the estimation of pi_0(lambda) over the
>entire
>range of pi_0.
>
>
>marcus
>
>
> >>> Naomi Altman <naomi at stat.psu.edu> 10/06/2004 9:16:00 AM >>>
>I have an Affy experiment with a very high level of differential
>expression.  It is a one-way ANOVA with 6 treatments, 2 replicates per
>
>treatment.
>
>We ran both SAM (excel version) and limma, and had very good agreement
>
>between them in terms of ranking the genes by the test statistic.  For
>any
>set of the top K genes, over 90% of the genes were identified by both
>routines.
>
>SAM automatically produces a q-values and estimates FDR and pi_0 (the
>percentage of non-differentially expressing genes).  I used the
>Bioconductor package "qvalue" to convert the limma p-values to
>q-values.  Both routines are supposed to be based on the same paper.
>But
>the SAM q-value for the most highly differentially expressed gene is
>.0039,
>whereas the q-value from "qvalue" is 3.9e-12.  The SAM q-value for the
>
>1000th most highly differentially expressed gene is also .0039, but the
>
>value from "qvalue" is 5.6e-10.
>
>As well, "qvalue" (at FDR=0.01) is returning genes whose p-values are
>pretty big - e.g. p=0.12.  Partly this is because the estimated pi_0 is
>
>just 7%.  By contrast, SAM estimates pi_0 to be 17% and returns a much
>
>smaller list of genes at the same FDR.  These genes have unadjusted
>p-values which are quite small.
>
>I guess if I believe SAM, I should be getting about 83% of my genes
>declared statistically significant - which, interestingly enough is
>about
>what I do get at FDR=.01 from "qvalue".
>
>As always, I welcome the insights of the members of this list.
>
>--Naomi
>
>
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Bioinformatics Consulting Center
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348
>(Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor
>
>______________________________________________________
>
>The contents of this e-mail are privileged and/or confidential to the
>named recipient and are not to be used by any other person and/or
>organisation. If you have received this e-mail in error, please notify
>the sender and delete all material pertaining to this e-mail.
>______________________________________________________

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111