[BioC] Unambiguously mapping of affy IDs to gene symbols using hgu133plus2.db

Benjamin Otto b.otto at uke.uni-hamburg.de
Fri Oct 1 13:27:05 CEST 2010


Hi Christian,

that's interesting, I remember that I used to fumble around a little bit when using these annotation packages because of the multiple mappings for some of the IDs. Maybe the way these multiple hits are treated has changed in last versions of the db packages. However, here are two points:

1) What I currently do is using some kind of a hybrid annotation table between the annotation delivered by the hgu133plus2.db mapping (some older version) and additional manual annotation via biomart. Biomart certainly has the advantage, that it should be more up to date than these packages ... at least to a certain degree.

2) If you decide to use biomart (solemnly or in combination with something else) for your annotation: Save your annotation table where you can find it later so you can work with a consistent table throughout the project.

3) If you are just at the beginning of your project and are wondering how to treat not so unique IDs or cases where several probesets encode for one gene: It might be a thought to have a look at the alternative mappings (providing cdf files) of Affyprobeminer oder the Brainarray mappings from Michigan.

Hope that helps a little bit.

regards

Benjamin



Am 01.10.2010 um 12:10 schrieb Christian Ruckert:

> Hi,
> I am doing some mapping of affymetrix probeset IDs to gene symbols using package hgu133plus2.db.
> 
> As the following example illustrates, each of the 40686 mapped probesets maps to exactly one gene symbol.
> 
> > library("hgu133plus2.db")
> > x <- hgu133plus2SYMBOL
> > Llength(x)
> [1] 54675
> > count.mappedkeys(x)
> [1] 40686
> 
> > head(nhit(x))
> 1007_s_at   1053_at    117_at    121_at 1255_g_at   1294_at
>        1         1         1         1         1         1
> 
> > table(nhit(x))
> 
>    0     1
> 13989 40686
> 
> 
> Am I correct, that annotation with gene symbol is only included in the package if it is unambiguously?
> 
> For example
> > x[["203074_at"]]
> [1] NA
> 
> But netaffx and biomart return:
> ANXA8, ANXA8L1, ANXA8L2
> 
> If doing a mapping between protein and gene expression arrays based on gene symbols, can results be improved using biomart instead of the annotation packages?
> 
> Christian
> 
> 
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-pc-linux-gnu
> 
> locale:
> [1] C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> 
> other attached packages:
> [1] hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1   RSQLite_0.9-1
> [4] DBI_0.2-5            AnnotationDbi_1.10.1 Biobase_2.8.0
> 
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

___________________________________________
Benjamin Otto, PhD
University Medical Center Hamburg-Eppendorf
Institute For Clinical Chemistry / Central Laboratories
Campus Forschung N27
Martinistr. 52,
D-20246 Hamburg

Tel.: +49 40 7410 51908
Fax.: +49 40 7410 54971
___________________________________________





-- 
Pflichtangaben gemäß Gesetz über elektronische Handelsregister und Genossenschaftsregister sowie das Unternehmensregister (EHUG):

Universitätsklinikum Hamburg-Eppendorf
Körperschaft des öffentlichen Rechts
Gerichtsstand: Hamburg

Vorstandsmitglieder:
Prof. Dr. Jörg F. Debatin (Vorsitzender)
Dr. Alexander Kirstein
Joachim Prölß
Prof. Dr. Dr. Uwe Koch-Gromus



More information about the Bioconductor mailing list