[BioC] Unambiguously mapping of affy IDs to gene symbols using hgu133plus2.db

James W. MacDonald jmacdon at med.umich.edu
Fri Oct 1 15:24:56 CEST 2010


Hi Christian,

On 10/1/2010 6:10 AM, Christian Ruckert wrote:
> Hi,
> I am doing some mapping of affymetrix probeset IDs to gene symbols using
> package hgu133plus2.db.
>
> As the following example illustrates, each of the 40686 mapped probesets
> maps to exactly one gene symbol.

Yes, this was a design change of (maybe) two releases ago. The default 
is to only expose unambiguous mappings.

This behavior can be modified using the toggleProbes() function.

 > table(nhit(hgu95av2SYMBOL))

     0     1
   901 11724
 > table(nhit(toggleProbes(hgu95av2SYMBOL, "all")))

     0     1     2     3     4     5     6     7
   493 11724   297    53    22     4    10     4
     8     9    10    11    12    14    20    21
     4     2     2     1     1     1     2     4
    22
     1

 > table(nhit(toggleProbes(hgu95av2SYMBOL, "multiple")))

     0     2     3     4     5     6     7     8
12217   297    53    22     4    10     4     4
     9    10    11    12    14    20    21    22
     2     2     1     1     1     2     4     1

See ?toggleProbes for more information.

Best,

Jim


>
>  > library("hgu133plus2.db")
>  > x <- hgu133plus2SYMBOL
>  > Llength(x)
> [1] 54675
>  > count.mappedkeys(x)
> [1] 40686
>
>  > head(nhit(x))
> 1007_s_at 1053_at 117_at 121_at 1255_g_at 1294_at
> 1 1 1 1 1 1
>
>  > table(nhit(x))
>
> 0 1
> 13989 40686
>
>
> Am I correct, that annotation with gene symbol is only included in the
> package if it is unambiguously?
>
> For example
>  > x[["203074_at"]]
> [1] NA
>
> But netaffx and biomart return:
> ANXA8, ANXA8L1, ANXA8L2
>
> If doing a mapping between protein and gene expression arrays based on
> gene symbols, can results be improved using biomart instead of the
> annotation packages?
>
> Christian
>
>
>  > sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-pc-linux-gnu
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] hgu133plus2.db_2.4.1 org.Hs.eg.db_2.4.1 RSQLite_0.9-1
> [4] DBI_0.2-5 AnnotationDbi_1.10.1 Biobase_2.8.0
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Douglas Lab
University of Michigan
Department of Human Genetics
5912 Buhl
1241 E. Catherine St.
Ann Arbor MI 48109-5618
734-615-7826
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues 



More information about the Bioconductor mailing list