[BioC] Annotation for Nonspecificity of Affymetrix Probes?

Jeff Sorenson jsorenson@bellsouth.net
Mon, 5 Aug 2002 11:49:45 -0500


I would like to thank all of the contributors to the bioconductor project
for putting their tools into the public domain.  I'm embarking on a project
using Affymetrix U133A/B chips and have been in the process of setting up a
database of probe/sequence information and other annotation information
(mysql), and learning to use the various R packages.  Looking over the probe
sequences and putative gene sequences that affymetrix provides on their
website, it is clear that many of the probes are nonspecific - e.g, they
perfectly match portions of gene sequences that are differenct than the one
they were derived from.  In some cases, it appears that affymetrix has
simply generated multiple probe sets for transcriptional variants of the
same gene.  In other cases, it appears that some probes are simply
nonspecific.  Affymetrix does warn us that some probe sets are less specific
than others, and this is indeed incorporated into their probe set
nomenclature, but I have found no downloadable file that lists the
specifics.  My computer should be done testing the half million probes for
perfect matches against the ~45000 sequences some time later this week.
After that, I will probably test the mismatch probes.

My question to this community is this:  is there already an annotation file
or package that takes this consideration into account?  If so, can this
information be readily adapted into the R packages for probe level analysis
and gene expression estimation?

In a related question, can anyone point me to an algorithm for accurately
estimating the hybridization probability of an arbitrary probe against an
arbitrary mRNA.  Would it correlate closely to the BLAST score?  Has anyone
done theoretical studies on the nature of the mismatch probes and their
usefulness in measuring "nonspecific" binding?  It would be nice to be able
to predict how strongly a particular mRNA should bind to each of the probes
on a chip (both PM and MM).  If this is feasable, has anyone done in computo
chip hybridization experiments to see how closely the estimated expression
levels are to the actual input?


Thanks,

Jeff Sorenson