[BioC] Help me understand org.Hs.eg.db

Hervé Pagès hpages at fhcrc.org
Tue Apr 7 20:32:54 CEST 2009


Hi Daren,

First note that for any Bimap object 'x':

   length(mget(mappedRkeys(x), x))

is the same as:

   count.mappedRkeys(x)

but the latter is much more efficient.

Furthermore, if 'x' is a right-to-left map like in your case
(see 'summary(x)'), then then 'count.mappedRkeys(x)' is equivalent
to 'count.mappedkeys(x)'

But generally speaking, there is no reason to expect:

   nrow(toTable(x)) == count.mappedkeys(x)  # generally not true

unless the mapping contained in 'x' is one-to-one.

Explanation:

'toTable(x)' returns a flat representation of Bimap object 'x' e.g.

      Lkey   Rkey
   1     a      A
   2     a      B
   3     b      A
   4     d      C

All the edges (or links) of the bipartite graph are listed. Note that
right key "A" is mapped to left keys "a" and "b", so this mapping is
not one-to-one. The left (or right) keys that don't map to anything
don't appear in this table.
'count.mappedRkeys(x)' counts the number of (unique) right keys that
map at least one left key i.e. 3 in the small example above.

So in fact, the following is true for any Bimap object 'x':

   length(unique(toTable(x)[[2]])) == count.mappedkeys(x)  # always TRUE

Hope this helps.

Cheers,
H.


Daren Tan wrote:
> I am using two approaches to get EntrezID to genes mapping, as well as
> genes to EntrezID mappings. toTable gives same number of mappings in
> both directions, but mget doesn't. Which approach should I trust and
> why ?
> 
>> dim(toTable(org.Hs.egSYMBOL2EG))
> [1] 39824     2
>> dim(toTable(org.Hs.egSYMBOL))
> [1] 39824     2
> 
>> length(mget(mappedRkeys(org.Hs.egSYMBOL2EG), org.Hs.egSYMBOL2EG))
> [1] 39800
>> length(mget(mappedLkeys(org.Hs.egSYMBOL), org.Hs.egSYMBOL))
> [1] 39824
> 
>> sessionInfo()
> R version 2.8.1 (2008-12-22)
> i386-pc-mingw32
> 
> locale:
> LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
> States.1252;LC_MONETARY=English_United
> States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
> 
> attached base packages:
> [1] splines   tools     stats     graphics  grDevices utils
> datasets  methods   base
> 
> other attached packages:
>  [1] KEGG.db_2.2.5       GOstats_2.8.0       Category_2.8.4
> genefilter_1.22.0   survival_2.34-1     RBGL_1.18.0
> annotate_1.20.1
>  [8] xtable_1.5-4        GO.db_2.2.5         graph_1.20.0
> org.Hs.eg.db_2.2.6  RSQLite_0.7-1       DBI_0.2-4
> AnnotationDbi_1.4.3
> [15] Biobase_2.2.2
> 
> loaded via a namespace (and not attached):
> [1] cluster_1.11.12       gdata_2.4.2           gplots_2.6.0
> GSEABase_1.4.0        gtools_2.5.0-1        xlsReadWritePro_1.4.0
> [7] XML_2.1-0
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list