[BioC] Filtering probes without annotation prior to statistical test
m.cowley at garvan.org.au
Tue Jul 29 07:26:32 CEST 2008
that type of filtering is definitely valid, and I have seen very
similar proportions of probesets with no annotation, however the
number of probesets in group (3) changes with each new transcript.csv
file (the latest being labelled na26), implying that some of the
probesets may have had annotation in a previous version, and some that
did have annotations no longer do.
The only caveat with removing (3) is that there may be differentially
expressed "genes/somethings" with a little effort in the form of
aligning probe sequences could reveal some interesting novel biology.
On 29/07/2008, at 12:37 PM, Seungwoo Hwang wrote:
> Dear all,
> I am analyzing data from Affymetrix Human Gene 1.0 ST Array.
> After inspecting its probe annotation file, it came to my attention
> that it contains a lot of probesets without transcript annotation as
> Total number of probesets: 33,298
> (1) Probesets with annotation: 24,409 (73%)
> (2) Control probesets: 4,201 (13%)
> (3) Probesets without any annotation: 4,688 (14%)
> I am thinking about filtering out the probesets (2) and (3) prior to
> statistical tests in order to reduce the total number of probesets
> that are subject to statistical tests. Doing so will make a lot of
> differences in multiple testing correction, compared to doing
> statistical tests on all probesets (1),(2), and (3) followed by
> filtering out the probesets (2) and (3) from the DEG list.
> Is this type of filtering prior to statistical tests valid? Also,
> has anyone encountered a similar situation (dealing with array data
> with a lot of non-gene probes).
> Seungwoo Hwang, Ph.D.
> Senior Research Scientist
> Korean Bioinformation Center (http://www.kobic.re.kr)
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor