[BioC] revmap question

lgautier at altern.org lgautier at altern.org
Thu Oct 9 17:29:26 CEST 2008


> James W. MacDonald wrote:
>> Hi Raffaele,
>>
>> rcaloger wrote:
>>> Hi,
>>> I  found very interesting the possibility of using reversing the
>>> mapping by revmap in the XXXX.db annotation databases.
>>>
>>> However, I have two problems:
>>> 1) if  I use:
>>> egs <- c("1", "100", "1000")
>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>
>>> I am getting not only the probesets associated to the three EGs:
>>>            1          1001          1002          1003         10001
>>>  "229819_at"  "1556117_at"   "204639_at" "216705_s_at"   "203440_at"
>>>        10002         10003
>>> "203441_s_at"   "237305_at"
>>
>> Well, not really. This appears to be so because you are unlisting a
>> named list. Since the names have to be unique,
>
> Well, that's were I don't follow the logic behind unlist() and I've always
> found this "feature" pretty strange. unlist() won't even make a good job
> at
> keeping the names unique:
>    > unlist(list(AA=letters[1:3], AA2="bb"))
>     AA1  AA2  AA3  AA2
>     "a"  "b"  "c" "bb"
> So mangling the names doesn't solve anything but just adds confusion.
>
> IMO it would be better if unlist() was keeping the original names, even if
> that
> means that they are not unique in the returned vector. At least I can do
> something
> with it programmatically, and it's easy. With the mangled names, it's much
> harder
> (there are a couple of serious pitfalls).
>

The problem might originate in what one could perceive a flaw with lists
(or any named vectors for that matter) in allowing non-unique names.

Mangled names are shurely a headache, as well as the "get only the first
element with the given name while it was not known there were several
elements with the same name" behavior in R.


L.

> H.
>
>
>> R adds an additional
>> integer to the end of duplicate names:
>>
>>  > egs <- c("1", "100", "1000")
>>  > mget(egs, revmap(hgu133plus2ENTREZID))
>> $`1`
>> [1] "229819_at"
>>
>> $`100`
>> [1] "1556117_at"  "204639_at"   "216705_s_at"
>>
>> $`1000`
>> [1] "203440_at"   "203441_s_at" "237305_at"
>>
>>> There is any possibility to avoid this problem?
>>>
>>> 2) if in the egs vector is present an eg (6333) that is not present in
>>> the annotation database I get the following error:
>>> egs <- c("1", "100", "1000", "6333")
>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>
>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>  value for "6333" not found
>>>
>>> There is any possibility to make a query that simply avoid the
>>> unmapped keys?
>>
>> Yes. The help for mget is a bit confusing on this point, but you need to
>> use the argument ifnotfound = NA.
>>
>>  > egs <- c("1", "100", "1000", "6333")
>>  > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>> $`1`
>> [1] "229819_at"
>>
>> $`100`
>> [1] "1556117_at"  "204639_at"   "216705_s_at"
>>
>> $`1000`
>> [1] "203440_at"   "203441_s_at" "237305_at"
>>
>> $`6333`
>> [1] NA
>>
>> Best,
>>
>> Jim
>>
>>
>>
>>>
>>>
>>> Many thanks
>>> Raffaele
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list