[BioC] probe to entrezID mapping with aafLocusLink

Marc Carlson mcarlson at fhcrc.org
Tue Dec 8 01:05:25 CET 2009


Hi Merja,

The probe that you are wondering about actually maps to 4 different
genes!  And, you can see this result if you take advantage of the newest
annotation packages.  You can expose this by using toggleProbes as follows:

EGMap = toggleProbes(hgu133plus2ENTREZID, "all")
get(probeID1, EGMap)

The reason that you won't see this with normal usage, is that the
annotation packages will by default try to hide such probes from you (in
this case, returning nothing instead of the 4 genes).  You have to use
the toggleProbes() method to uncover the mappings that match to cross
hybridizing probes.  This was done because 1) most people want to avoid
probes that cross hybridize like this and 2) legacy code would probably
have broken all over the place if we had just unleashed this change on
everyone as a default behavior.  The reason that you are seeing so many
"_x_" probes that have this problem is that those are probes that Affy
knows tend to be cross-hybridizers.  So the fact that these probes map
to multiple things is not much of a surprise.  Please let us know if you
still have questions.


  Marc





James W. MacDonald wrote:
> Hi Merja,
>
> Merja Heinaniemi wrote:
>> Hi!
>>
>> I was mapping probeIDs from 133plus2 arrays to entrezIDs using
>> aafLocusLink, some months ago with an earlier version of the package,
>> and now with the current annaffy and hgu133plus2 packages. I compared
>> my results and some probes no longer got mapped with the new package
>> version, e.g POU5F1. The gene does have probes on the array, all just
>> happen to be x_at probes. So I thought maybe all those less specific
>> probes lack entrez mappings but another gene with x_at does have a
>> matching entrezID. So why is e.g POU5F1 missing one? I include below
>> the R code that can be used to reproduce my problem (even the first
>> part if any hgu133Plus2 arrays are read in), sessionInfo is given at
>> the end.
>>
>> And more importantly, how do I get such probes mapped to an entrezID
>> using Bioconductor? I was assuming the hgu133plus2 package contains
>> all manufacturer annotations so I should find a match, or am I wrong?
>
> As you note, this symbol is no longer mapped to 208286_x_at in the
> current hgu133plus2.db package. I don't know why; netaffx still claims
> this mapping. Perhaps Marc Carlson can shed some light.
>
> You could map the Affy IDs to Entrez Gene using biomaRt as well, and
> that mapping still exists:
>
> > getBM("entrezgene","affy_hg_u133_plus_2", "208286_x_at", mart)
>   entrezgene
> 1       5460
> 2       5462
>
> I assume you are using aafLocusLink() because you are creating HTML or
> text tables for your output. Or perhaps you don't know that you can
> simply do:
>
> > mget(c("208286_x_at","215600_x_at"), hgu133plus2ENTREZID)
> $`208286_x_at`
> [1] NA
>
> $`215600_x_at`
> [1] "285231"
>
> to do the mapping?
>
> <self promotion>
>
> If you are trying to create tables and would like to do the mappings
> via biomaRt, you could use either limma2biomaRt() or probes2tableBM()
> in the affycoretools package, which will output HTML or text tables
> with links to various databases, like you get with annaffy (but
> without the sweet css candy that colors the expression values
> according to the expression level).
>
> </self promotion>
>
> Best,
>
> Jim
>
>
>>
>> thanks in advance!
>>
>> Merja
>>
>>
>>
>> ##R commands:
>>
>> #affybatch=read.affybatch(filenames=Filenames)
>> #eset=rma(affybatch)
>> #grep("208286_x_at",featureNames(eset))
>> #[1] 17711
>>
>> library(annaffy)
>> library(hgu133plus2.db)
>> probeID1="208286_x_at" ##this is POU5F1 entrezID 5460
>> probeID2="215600_x_at"  ##this is FBXW12 entrezID 285231
>> entrezID1=aafLocusLink(probeID1, "hgu133plus2.db")
>> entrezID1
>> #integer()
>> entrezID2=aafLocusLink(probeID2, "hgu133plus2.db")
>> entrezID2
>> #[1] 285231
>>
>> x <- hgu133plus2ENTREZID
>> ## Get the probe identifiers that are mapped to an ENTREZ Gene ID
>> mapped_probes <- mappedkeys(x)
>> ## Convert to a list
>> xx <- as.list(x[mapped_probes])
>> xx[xx=="5460"]
>> #list()
>> xx[xx=="285231"]
>> #$`1564138_at`
>> #[1] "285231"
>>
>> #$`215600_x_at`
>> #[1] "285231"
>>
>>> sessionInfo()
>> #R version 2.10.0 (2009-10-26)
>> #i386-apple-darwin9.8.0
>>
>> #locale:
>> #[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
>>
>> #attached base packages:
>> #[1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> #other attached packages:
>> # [1] hgu133plus2cdf_2.5.0 hgu133plus2.db_2.3.5 org.Hs.eg.db_2.3.6  
>> annaffy_1.18.0       KEGG.db_2.3.5        GO.db_2.3.5
>> # [7] RSQLite_0.7-3        DBI_0.2-4            AnnotationDbi_1.8.1 
>> affy_1.24.2          Biobase_2.6.0
>>
>> #loaded via a namespace (and not attached):
>> #[1] affyio_1.14.0        preprocessCore_1.8.0 tools_2.10.0
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list