[BioC] PreFiltering probe in microarray analysis

Sun Jun 12 17:40:54 CEST 2011

Hi, Dear Wolfgang,

I think it would nice to bring up a discussion here about the gene
prefiltering issue. Please point me out if this suggestion is
inappropriate.

There are two questions in the gene filtering which I could not find answers:
1). In the traditional multiple tests to correct the p-values of many test
groups for example, in a new drug effect experiment, is it appropriate to
remove some group tests from the whole experiment? If not, why can we
prefilter the genes?
2). As I stated in the previous email, we assume that the raw pvalues and
the top lowest-pvalue genes are the same before (35k genes) and after gene
filtering (5k genes), the gene x you selected from 35K versus the one
selected from 5K, which is more sound? In other words, the best student
selected from 1000 students versus the best student selected from 100,
which is more sound?

So this is a question of the whole point of gene prefiltering approach.

Best wishes,

Wayne
--
> Hi Swapna
>
> Il Jun/2/11 7:58 PM, Swapna Menon ha scritto:
>> Hi Stephanie,
>> There is another recent paper that you might consider which also
>> cautions about filtering
>> Van Iterson, M., Boer, J. M.,&  Menezes, R. X. (2010). Filtering, FDR
>> and power. BMC Bioinformatics, 11(1), 450.
>> They also recommend their own statistical test to see if one's filter
>> biases FDR.
>> currently I am trying variance filter and feature filter from
>> genefilter package: try ?nsFilter for help on these functions.
>> However, I dont use filtering routinely since choosing the right
>> filter , parameters and testing the effects of any bias are things I
>> have not worked out in addition to having read Bourgon et al and
>> Iterson et al and others that discuss this issue.
>> About your limma results, while conventional filtering may be expected
>> to increase the number of significant genes, as the papers suggest
>> likelihood of false positives also increases.
>
> No. Properly applied filtering does not affect the false positive rates
> (FWER or FDR). That's the whole point of it. [1]
>
> If one is willing to put up with higher rate or probability of false
> discoveries, then don't do filtering - just increase the p-value cutoff.
>
> [1] Bourgon et al., PNAS 2010.
>
>> In your current results,
>> do you have high fold changes above 2 (log2>1)?  You may want to
>> explore the biological relevance of those genes with high FC and
>> significant unadjusted p value.
>> Best,
>> Swapna
>
> Best wishes
> Wolfgang Huber
> EMBL
> http://www.embl.de/research/units/genome_biology/huber
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>