[BioC] How does BioC map from Probe ID to Entrez Gene?

Jake jjmichael at comcast.net
Fri May 5 20:00:21 CEST 2006


Hi all,

I've finished up with an analysis and in reviewing some of the
annotations for gene symbols and RefSeqs, I've found some discrepancies
that I don't know how to explain.  The discrepancies are between
Affy-supplied annotation (both both CSV and NetAffx) and BioC
annotation.

Let's take this probe for example: 
1558097_at

> sessionInfo()
Version 2.3.0 (2006-04-24)
i686-pc-linux-gnu

attached base packages:
[1] "methods"   "stats"     "graphics"  "grDevices" "utils"
"datasets"
[7] "base"

other attached packages:
hgu133plus2
   "1.12.0"

> mget("1558097_at", hgu133plus2LOCUSID)
$`1558097_at`
[1] 8971

On NetAffx, the Entrez Gene ID shows 253143.

I've got about 12 other probe sets that BioC and Affy disagree strongly
on (symbols, RefSeqs, etc.).  I suspect these can all be traced back the
the Entrez ID disagreement.  Since much of BioC's subsequent annotation
is based on the Entrez Gene ID, the correct mapping from the Affy Probe
ID to the Entrez gene ID is crucial.

Which brings me to my question - how exactly does BioC map from Affy
probe IDs to Entrez Gene IDs? There seems to be thorough documentation
of how Entrez IDs are mapped to other annotations like Pubmed, GO, etc.
but not much on how the Entrez Gene ID was mapped from the probe ID in
the first place.  My cursory "hand" examination tends to side with Affy,
by BLAST-ing their probe sequences.

Any enlightenment would be much appreciated.

Thanks,

Jake



More information about the Bioconductor mailing list