[BioC] How to go from affymetrix to Ensembl transcript IDs

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Apr 10 00:01:03 CEST 2009


Hi Peter,

On Apr 9, 2009, at 5:40 PM, Peter Robinson wrote:

> Hi all,
>
> sorry if this is a dumb question, but rtfm has not helped so far.
>
> I would like to get the Ensembl transcript IDs that correspond to  
> affymetrix probeset ids using biomaRt. As a test case, I am using  
> the ALL data set from bioconductor. My code:
>
>
> library("biomaRt")
> library("ALL")
> data("ALL")  ## Note this dataset uses hgu95av2 Affymetrix chip
>
> dat <- exprs(ALL)
> affyids = rownames(dat)
>
>
> ## get mapping data from Ensembl via bioMaRt
> ensembl <- useMart("ensembl")
> ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
>
> mapping <- getBM(attributes = c("affy_hg_u95av2",  
> "ensembl_transcript_id"), filters = "affy_hg_u95av2",
>   values = affyids, mart = ensembl)
>
>
>
> Here is where the problem is. The "mapping" seems to be a random  
> collection of transcript IDs.

Your query is right, so ... your results are not random. You can  
double check by trying the small example in the ?getBM help.

Anyway: that probe looks a-weird one. Even affy maps it to several  
locations. See:

https://www.affymetrix.com/analysis/netaffx/fullrecord.affx?pk=HG-U95AV2%3A32337_AT 
#a_ensembl

You will need an Affy NetAffx account to see that. Some relevant stats  
from that page are that the probe maps to 6 different ensembl IDs.

It even aligns to two different places:

chr13:26725913-26728689(+)
chr10:122104175-122104685(-)

You'll probably find this for many probes, so you'll need some policy  
to deal with that.

Hope that helps,
-steve

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

http://cbio.mskcc.org/~lianos



More information about the Bioconductor mailing list