[BioC] PreFiltering probe in microarray analysis

Arno, Matthew matthew.arno at kcl.ac.uk
Mon Jun 13 12:55:20 CEST 2011

Speaking as a pure 'biologist', I think it's OK to pre-filter genes as long you know the pitfalls, in terms of the potential bias and affect on FDRs. I am personally aware of people pre-filtering not only to enhance the FDR, but to use the results of a t-test as a starting point for a second sequential t-test because the FDRs from this test are 'amazingly good'.

However statistically sacrilegious this is, the top 10 genes are always going to be the same top 10 genes, so if you are just looking for the top 10 genes, this is essentially OK. 

How does that hang with you guys? 


Matthew Arno, Ph.D.
Genomics Centre Manager
King's College London
The contents of this email are strictly confidential. It may not be transmitted in part or in whole to any other individual or groups of individuals.
This email is intended solely for the use of the individual(s) to whom they are addressed and should not be released to any third party without the consent of the sender.

>-----Original Message-----
>From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-
>project.org] On Behalf Of wxu at msi.umn.edu
>Sent: 12 June 2011 16:41
>To: Wolfgang Huber
>Cc: bioconductor at r-project.org
>Subject: Re: [BioC] PreFiltering probe in microarray analysis
>Hi, Dear Wolfgang,
>I think it would nice to bring up a discussion here about the gene
>prefiltering issue. Please point me out if this suggestion is
>There are two questions in the gene filtering which I could not find
>1). In the traditional multiple tests to correct the p-values of many
>groups for example, in a new drug effect experiment, is it appropriate
>remove some group tests from the whole experiment? If not, why can we
>prefilter the genes?
>2). As I stated in the previous email, we assume that the raw pvalues
>the top lowest-pvalue genes are the same before (35k genes) and after
>filtering (5k genes), the gene x you selected from 35K versus the one
>selected from 5K, which is more sound? In other words, the best student
>selected from 1000 students versus the best student selected from 100,
>which is more sound?
>So this is a question of the whole point of gene prefiltering approach.
>Best wishes,
>> Hi Swapna
>> Il Jun/2/11 7:58 PM, Swapna Menon ha scritto:
>>> Hi Stephanie,
>>> There is another recent paper that you might consider which also
>>> cautions about filtering
>>> Van Iterson, M., Boer, J. M.,&  Menezes, R. X. (2010). Filtering, FDR
>>> and power. BMC Bioinformatics, 11(1), 450.
>>> They also recommend their own statistical test to see if one's filter
>>> biases FDR.
>>> currently I am trying variance filter and feature filter from
>>> genefilter package: try ?nsFilter for help on these functions.
>>> However, I dont use filtering routinely since choosing the right
>>> filter , parameters and testing the effects of any bias are things I
>>> have not worked out in addition to having read Bourgon et al and
>>> Iterson et al and others that discuss this issue.
>>> About your limma results, while conventional filtering may be
>>> to increase the number of significant genes, as the papers suggest
>>> likelihood of false positives also increases.
>> No. Properly applied filtering does not affect the false positive
>> (FWER or FDR). That's the whole point of it. [1]
>> If one is willing to put up with higher rate or probability of false
>> discoveries, then don't do filtering - just increase the p-value
>> [1] Bourgon et al., PNAS 2010.
>>> In your current results,
>>> do you have high fold changes above 2 (log2>1)?  You may want to
>>> explore the biological relevance of those genes with high FC and
>>> significant unadjusted p value.
>>> Best,
>>> Swapna
>> Best wishes
>> Wolfgang Huber
>> http://www.embl.de/research/units/genome_biology/huber
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>Bioconductor mailing list
>Bioconductor at r-project.org
>Search the archives:

More information about the Bioconductor mailing list