[BioC] Limma analysis of focused arrays vs. whole genome arrays

Naomi Altman naomi at stat.psu.edu
Fri Jul 22 16:10:33 CEST 2005

I am a bit late on this discussion, but here is my input:

We are using EST arrays, which have similar problems.  We use a few 
spike-ins (3-10), spotted many times each.  (We are using 50 
spots/spike-in, but I think 25 would be sufficient.)  We spike them in a 
titration series, which helps us determine how well the loess works (versus 
"A").  Having many replicates is very valuable - it gives us an idea of the 
natural variation in the system, and the within spot correlation for the 2 
channels.  It also means that we can create our own oligos, which is a good 
thing, given the cost of oligos.

In one of our experiments, ordinary loess seems to be fine - i.e. the 
spike-ins end up where they should be.  In the other experiment, the 
spike-ins "move" dramatically under one condition.  We are investigating 
whether this was lab error, or a need for a different normalization.

If most genes differentially express, you should abandon q-values.  The 
reason is obvious if you think about it -  the q-value is the estimated 
percentages of false detections.  If 90% of the genes actually 
differentially express, the max. q-value is going to be 10%, even if you 
declare all the genes significant.  In that case, you should be controlling 
the FNR - false non-detect rate.  Storey's 2003 paper also discusses 
estimating FNR - but this has not been so popular.  In any case, my 
simplistic solution is that if the q-value routine indicates small pi0, 
then I don't do multiple comparisons corrections.  What do I mean by 
small?  To date in the studies I have been involved in, we have always had 
either pi0>90% or pi0<25%.  So I have not had to worry too much about 
"small" - 25% is certainly small enough.


At 09:33 AM 6/7/2005, Mike Schaffer wrote:
>The lab I work with has used "whole genome" human arrays (~18,000 genes) 
>for a couple years and I have helped with the analysis using Limma.  Now, 
>due to costs, they are now considering switching from whole genome arrays 
>to focused arrays with ~400 genes of interest (selected from the 
>whole-genome array results).
>The obvious analysis problems with a focused array where most genes are 
>changing are:
>1. LOESS normalization assumes most genes are not changing.  If most of 
>the genes are expected to change, there is no basis to recenter the data 
>around zero.  The response from the lab was that they would be willing to 
>include 100-150 genes that are not expected to change.
>2. The B-statistic in Limma requires a parameter indicating a certain 
>fraction of genes are changing.  The corresponding moderated t-statistic 
>uses the data from all genes to moderate the standard error in the t 
>calculation.  Both of these could change dramatically if most of the genes 
>on the array are changing.
>My questions are:
>1. Are my concerns valid and are there ways around around them?  Are there 
>other analysis pitfalls with this scenario?
>2. Can Limma handle situations where most of an array is expected to 
>change?  What modifications, if any, need to be made to the Limma analysis 
>to account for this?
>3. Alternatively, is there a more appropriate statistical package to use 
>in this case?
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Bioinformatics Consulting Center
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111

More information about the Bioconductor mailing list