[BioC] revmap question

Robert Gentleman rgentlem at fhcrc.org
Thu Oct 9 18:04:59 CEST 2008



lgautier at altern.org wrote:
>> James W. MacDonald wrote:
>>> Hi Raffaele,
>>>
>>> rcaloger wrote:
>>>> Hi,
>>>> I  found very interesting the possibility of using reversing the
>>>> mapping by revmap in the XXXX.db annotation databases.
>>>>
>>>> However, I have two problems:
>>>> 1) if  I use:
>>>> egs <- c("1", "100", "1000")
>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>
>>>> I am getting not only the probesets associated to the three EGs:
>>>>            1          1001          1002          1003         10001
>>>>  "229819_at"  "1556117_at"   "204639_at" "216705_s_at"   "203440_at"
>>>>        10002         10003
>>>> "203441_s_at"   "237305_at"
>>> Well, not really. This appears to be so because you are unlisting a
>>> named list. Since the names have to be unique,
>> Well, that's were I don't follow the logic behind unlist() and I've always
>> found this "feature" pretty strange. unlist() won't even make a good job
>> at
>> keeping the names unique:
>>    > unlist(list(AA=letters[1:3], AA2="bb"))
>>     AA1  AA2  AA3  AA2
>>     "a"  "b"  "c" "bb"
>> So mangling the names doesn't solve anything but just adds confusion.
>>
>> IMO it would be better if unlist() was keeping the original names, even if
>> that
>> means that they are not unique in the returned vector. At least I can do
>> something
>> with it programmatically, and it's easy. With the mangled names, it's much
>> harder
>> (there are a couple of serious pitfalls).
>>
> 
> The problem might originate in what one could perceive a flaw with lists
> (or any named vectors for that matter) in allowing non-unique names.
> 
> Mangled names are shurely a headache, as well as the "get only the first
> element with the given name while it was not known there were several
> elements with the same name" behavior in R.

   I disagree - I think that requiring unique row names in R is/was a 
mistake - restrictions are often expensive - as they limit what can be 
done.  Yes there are issues about dealing with non-unique row names, but 
those can be dealt with, by careful programming. Such methods would work 
in all cases of duplicate row names, but with name-mangling schemes, one 
needs to know what name mangling scheme was used to be able to 
disentangle - and that means every solution is different -- not exactly 
the kind of situation I would personally engineer in.

  best wishes
    Robert

> 
> 
> L.
> 
>> H.
>>
>>
>>> R adds an additional
>>> integer to the end of duplicate names:
>>>
>>>  > egs <- c("1", "100", "1000")
>>>  > mget(egs, revmap(hgu133plus2ENTREZID))
>>> $`1`
>>> [1] "229819_at"
>>>
>>> $`100`
>>> [1] "1556117_at"  "204639_at"   "216705_s_at"
>>>
>>> $`1000`
>>> [1] "203440_at"   "203441_s_at" "237305_at"
>>>
>>>> There is any possibility to avoid this problem?
>>>>
>>>> 2) if in the egs vector is present an eg (6333) that is not present in
>>>> the annotation database I get the following error:
>>>> egs <- c("1", "100", "1000", "6333")
>>>> unlist(mget(egs, revmap(hgu133plus2ENTREZID)))
>>>>
>>>> Error in .checkKeys(value, Rkeys(x), x at ifnotfound) :
>>>>  value for "6333" not found
>>>>
>>>> There is any possibility to make a query that simply avoid the
>>>> unmapped keys?
>>> Yes. The help for mget is a bit confusing on this point, but you need to
>>> use the argument ifnotfound = NA.
>>>
>>>  > egs <- c("1", "100", "1000", "6333")
>>>  > mget(egs, revmap(hgu133plus2ENTREZID), ifnotfound = NA)
>>> $`1`
>>> [1] "229819_at"
>>>
>>> $`100`
>>> [1] "1556117_at"  "204639_at"   "216705_s_at"
>>>
>>> $`1000`
>>> [1] "203440_at"   "203441_s_at" "237305_at"
>>>
>>> $`6333`
>>> [1] NA
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>
>>>> Many thanks
>>>> Raffaele
>>>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the Bioconductor mailing list