[BioC] How to go from affymetrix to Ensembl transcript IDs

Peter Robinson peter.robinson at t-online.de
Thu Apr 9 23:40:39 CEST 2009


Hi all,

sorry if this is a dumb question, but rtfm has not helped so far.

I would like to get the Ensembl transcript IDs that correspond to 
affymetrix probeset ids using biomaRt. As a test case, I am using the 
ALL data set from bioconductor. My code:


library("biomaRt")
library("ALL")
data("ALL")  ## Note this dataset uses hgu95av2 Affymetrix chip

dat <- exprs(ALL)
affyids = rownames(dat)


## get mapping data from Ensembl via bioMaRt
ensembl <- useMart("ensembl")
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)

mapping <- getBM(attributes = c("affy_hg_u95av2", 
"ensembl_transcript_id"), filters = "affy_hg_u95av2",
    values = affyids, mart = ensembl)



Here is where the problem is. The "mapping" seems to be a random 
collection of transcript IDs.

 > which(mapping=="32337_at")
 [1]     8    46   139   155   203   267   320   327  7385  8701 18769 20533
[13] 23728 23969 23972 24241 24242 24243 24244 25236 26157 26204 26218 26231
[25] 26240 26321 26404
 > mapping[which(mapping=="32337_at"),]
      affy_hg_u95av2 ensembl_transcript_id
8           32337_at       ENST00000404812
46          32337_at       ENST00000393574
139         32337_at       ENST00000403842
155         32337_at       ENST00000397467
203         32337_at       ENST00000407990
267         32337_at       ENST00000399007
320         32337_at       ENST00000404500
327         32337_at       ENST00000399891
7385        32337_at       ENST00000396599
8701        32337_at       ENST00000403916
18769       32337_at       ENST00000334328
20533       32337_at       ENST00000377603
23728       32337_at       ENST00000401418
23969       32337_at       ENST00000046640
23972       32337_at       ENST00000381870
24241       32337_at       ENST00000326092
24242       32337_at       ENST00000319826
24243       32337_at       ENST00000272274
24244       32337_at       ENST00000311549
25236       32337_at       ENST00000404512
26157       32337_at       ENST00000404609
26204       32337_at       ENST00000402713
26218       32337_at       ENST00000401464
26231       32337_at       ENST00000407389
26240       32337_at       ENST00000406161
26321       32337_at       ENST00000402658
26404       32337_at       ENST00000401595

At the end of the day, I would like to write the data matrix as a CSV 
file for further analysis, whereby the affy ID is replaced by an Ensembl 
ID.

Thanks Peter



More information about the Bioconductor mailing list