[BioC] Invalid fold-filter

Jenny Drnevich drnevich at uiuc.edu
Mon Feb 20 18:05:34 CET 2006


Hello all,

I have also pondered over the issue of filtering genes to reduce the amount 
of multiple hypothesis correction and what is or isn't valid statistically. 
I do routinely filter on some estimate of "presence", either Affy's P/M/A 
calls, or for spotted arrays, comparison to blanks, buffers and/or negative 
controls. However, I only filter if a gene is not deemed "present" on all 
of the arrays; my rationale for this is that if it's a whole-genome array, 
only a subset of those genes will be expressed in any particular tissue, 
developmental stage, etc. I keep a gene if it is "present" in at least one 
sample rather than say, half the samples as I've seen in other analyses, 
because the possibility exists that a gene may be expressed in only one of 
the treatment groups.

On the other hand, I've never been comfortable with filtering on even a 
non-specific measure of variation across arrays. After reading's Jim's 
response, I agree that if you're mainly interested in sample 
classification, then it could be reasonable to filter out genes that do not 
vary, but it still doesn't seem right to do this if you're mainly 
interested in determining differential expression between two or more known 
classes. My reasoning is that the p-values are based on the null 
F-distribution, and that by removing genes with little variance, you are in 
effect removing the left side of the F-distribution, which would seem to 
invalidate the p-values because the area under the remaining distribution 
has changed. If you couldn't tell, my logic is not based on formal 
statistical theory but rather on my intuitive feel on the matter!

Cheers,
Jenny



Jenny Drnevich, Ph.D.

Functional Genomics Bioinformatics Specialist
W.M. Keck Center for Comparative and Functional Genomics
Roy J. Carver Biotechnology Center
University of Illinois, Urbana-Champaign

330 ERML
1201 W. Gregory Dr.
Urbana, IL 61801
USA

ph: 217-244-7355
fax: 217-265-5066
e-mail: drnevich at uiuc.edu



More information about the Bioconductor mailing list