[BioC] Filtering is not recommended with LIMMA?

Wolfgang Huber whuber at embl.de
Sun May 26 14:44:17 CEST 2013


Dear Gordon

> The literature tends to say that the reason for filtering is to reduce the amount of multiple testing, but in truth the increase in power from this is only slight.  The more important reason for filtering in most applications is to remove highly variable genes at low intensities.  The importance of filtering is highly dependent on how you pre-processed your data.  Filtering is less important if you (i) use a good background correction or normalising method that damps down variability at low intensities and (ii) use eBayes(trend=TRUE) which accommodates a mean-variance trend.

With all respect, I think this paragraph mixes up two separate issues and can benefit from clarification.

1. While literature can probably be found to support any statement, the above-cited reason is indeed bogus when multiple testing is performed with an FDR objective. The paper by Bourgon et al. motivates filtering differently, namely by using a filter criterion that is independent of the test statistic under the null (thus does not affect type-I error; some subtlety is discussed in that paper) but dependent under the alternative (thus improves power).

2. "Highly variable genes at low intensities" are indeed a problem of bad preprocessing and are better dealt with at that level, not by filtering. Nowadays, the commonly used methods for expression microarray or RNA-Seq analysis that I am aware of avoid that problem.

3. The question when & how independent filtering (as in 1) is beneficial is quite unrelated to preprocessing. You are right that FDR is a property of the whole selected gene list, not of individual genes, and that different approaches exist for spending the type-I error budget wisely, by weighting different genes differently; of which independent filtering is one and trended eBayes (which is not the default option in limma) may be another.

	Best wishes
	Wolfgang

Reference:
Bourgon et al. PNAS 2010: http://www.pnas.org/content/107/21/9546



More information about the Bioconductor mailing list