[BioC] Question about mget vs. select for annotation package

Marc Carlson mcarlson at fhcrc.org
Tue Jul 9 13:19:12 CEST 2013


Hi Christina,

The basic problem is that the bimap interface was created in order to 
emulate an even older set of environments.  And for these older 
platforms, people mostly were initially very uninterested in keys (probe 
IDs) that mapped to multiple different things as those probes were 
usually IDs from microarrays.  And a probe on a microarray that maps to 
multiple targets is probably just a bad probe...  So software was 
written with that limitation in mind, and time marched on and now if we 
changed it, some of that old code might break.  Later on, when people 
started to use these bimaps for things other than microarrays, we kept 
that multiple probe limitation for backwards compatibility, and then 
provided the toggleProbes() method that Herve mentioned so that people 
could get that data if they cared to.

And when we wrote the newer select() interface, the world had moved on 
to where we were doing both microarray and a host of other things like 
high throughput sequencing, and annotation was mostly something that 
people did at the end of an analysis, and usually just to decorate a 
data.frame object.  So when we wrote select() we were now free to always 
expose all the data for a probe or gene and just warn the user that they 
might be getting back more data than was expected (when that actually 
happened). So select() was really designed to be a more general 
annotation tool.  At this time, we are hoping that most people will use 
select() which offers a simpler way to access this data.  But we still 
provide the older bimap interface mostly for the sake of backwards 
compatibility.


   Marc



On 07/02/2013 03:26 PM, Hervé Pagès wrote:
> Hi Christina,
>
> In AnnotationDbi jargon, a probe that matches multiple genes is called
> a multiple probe. When using the classic Bimap API, multiple probles are
> mapped to NA by default. Unless you use toggleProbes() on the Bimap
> object to request the full mapping:
>
>   > map <- toggleProbes(hgu133plus2ENTREZID, "all")
>
>   > mget("213801_x_at", map)
>   $`213801_x_at`
>   [1] "3921"   "388524" "574040" "6044"   "653162" "730029"
>
> Personally I think that making multiple probes appear that they're
> not mapped to any gene is not doing any good. Hopefully at some point
> this can be reconsidered.
>
> Cheers,
> H.
>
>
> On 07/02/2013 02:53 PM, Christina Chaivorapol wrote:
>> Hi,
>>
>> I seem to be getting different results depending on if I use select() or
>> mget() with the hgu133plus2.db package for a probe with a 1 probe to 
>> many
>> gene mapping. Does anyone know why there is a discrepancy?
>>
>>> select(hgu133plus2.db, keys="213801_x_at", cols=c("ENTREZID", 
>>> "SYMBOL"),
>> keytype="PROBEID")
>>        PROBEID ENTREZID  SYMBOL
>> 1 213801_x_at     3921    RPSA
>> 2 213801_x_at   388524 RPSAP58
>> 3 213801_x_at   574040  SNORA6
>> 4 213801_x_at     6044 SNORA62
>> 5 213801_x_at   653162  RPSAP9
>> 6 213801_x_at   730029 RPSAP19
>> Warning message:
>> In .generateExtraRows(tab, keys, jointype) :
>>    'select' resulted in 1:many mapping between keys and return rows
>>
>>> mget("213801_x_at", hgu133plus2ENTREZID)
>> $`213801_x_at`
>> [1] NA
>>
>>> sessionInfo()
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>   [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>   [7] LC_PAPER=C                 LC_NAME=C
>>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] hgu133plus2.db_2.9.0 org.Hs.eg.db_2.9.0   RSQLite_0.11.3
>> [4] DBI_0.2-6            AnnotationDbi_1.22.3 Biobase_2.20.0
>> [7] BiocGenerics_0.6.0   limma_3.16.2
>>
>> loaded via a namespace (and not attached):
>> [1] IRanges_1.18.0 stats4_3.0.0   tools_3.0.0
>>
>> Thanks,
>> Christina
>>
>



More information about the Bioconductor mailing list