[BioC] Filtering probes without annotation prior to statistical test
swhwang10 at yahoo.com
Tue Jul 29 04:37:44 CEST 2008
I am analyzing data from Affymetrix Human Gene 1.0 ST Array.
After inspecting its probe annotation file, it came to my attention that it contains a lot of probesets without transcript annotation as follows;
Total number of probesets: 33,298
(1) Probesets with annotation: 24,409 (73%)
(2) Control probesets: 4,201 (13%)
(3) Probesets without any annotation: 4,688 (14%)
I am thinking about filtering out the probesets (2) and (3) prior to statistical tests in order to reduce the total number of probesets that are subject to statistical tests. Doing so will make a lot of differences in multiple testing correction, compared to doing statistical tests on all probesets (1),(2), and (3) followed by filtering out the probesets (2) and (3) from the DEG list.
Is this type of filtering prior to statistical tests valid? Also, has anyone encountered a similar situation (dealing with array data with a lot of non-gene probes).
Seungwoo Hwang, Ph.D.
Senior Research Scientist
Korean Bioinformation Center (http://www.kobic.re.kr)
More information about the Bioconductor