[BioC] Invalid fold-filter

Sean Davis sdavis2 at mail.nih.gov
Wed Feb 22 00:40:05 CET 2006




On 2/21/06 12:17, "Robert Gentleman" <rgentlem at fhcrc.org> wrote:

> Hi,
> 
>   In substance I agree with Naomi, but I do want to suggest that there
> are likely to be biases (statistical sense) introduced by filtering on a
> lack of annotation and I personally would want to deal with that at the
> end of the analysis, not at the beginning.
> 
>   Not all molecular systems are equally studied, or published on, and if
> your experiment has intersected with one of these, then pre-filtering
> will hide that information from you. In some cases this is not a
> concern, but in others it may be.
>
>   Of course you can do little with the data if there is no annotation -
> but even there, you can get the sequence and do some reasonable stuff
> with that much information these days.

I would second this sentiment.  Genome annotation is a moving target.  A
probe or probeset that represents a gene one day may not the next or may
represent a different one.  In addition, microarray manufacturers typically
focus on one or two sets of genomic annotation (for example, the RefSeq set
from NCBI); there are multiple other sets of genome annotation that may be
more inclusive or have different coverage of the "full" set of transcripts.
While it may be prohibitively complicated and time-consuming for many labs
to blast every probe or consensus sequence to all known transcripts (and
even genbank) a priori (although that is what our lab does, in practice),
once a gene list is available that includes an "EST" or anonymous sequence,
it is often quite enlightening to look at blast results against various
genome annotations.  Often, these probes represent a gene family or some
highly-conserved domain; while this isn't necessarily useful information
in-and-of itself, in a particular biologic context or gene set, having such
information may just as hypothesis-generating as probes that "cleanly"
represent a gene.

Sean



More information about the Bioconductor mailing list