[BioC] Bimap Subsetting

Hervé Pagès hpages at fhcrc.org
Mon Oct 22 20:00:13 CEST 2012


On 10/22/2012 09:48 AM, Hervé Pagès wrote:
> Hi Dario,
>
> org.Hs.egSYMBOL is a "direct" map, i.e. it maps from left
> to right. This means that keys() is equivalent to Lkeys()
> (the "keys" are actually the "left keys").
>
> Subsetting a Bimap by a given set of keys only reduces its
> set of "keys" ("left keys" if a direct map, "right keys"
> otherwise). In the case of a "direct" map, it means that
> the resulting map is now mapping the reduced set of "left keys"
> to the original set of "right keys". The set of "right keys"
> remains untouched but the number of right keys that are
> actually mapped to something on the left is of course smaller:
>
>    > Rlength(org.Hs.egSYMBOL)
>    [1] 43051
>    > count.mappedRkeys(org.Hs.egSYMBOL)
>    [1] 43051
>
>    > Rlength(org.Hs.egSYMBOL[mykeys])
>    [1] 43051
>    > count.mappedRkeys(org.Hs.egSYMBOL[mykeys])
>    [1] 6
>

FWIW, an analogy can be made with subsetting a factor where, by
default, the unused levels are not dropped:

   > x <- factor(letters[1:10])
   > x
    [1] a b c d e f g h i j
   Levels: a b c d e f g h i j
   > x[1:6]
   [1] a b c d e f
   Levels: a b c d e f g h i j

unless you use drop=TRUE:

   > x[1:6, drop=TRUE]
   [1] a b c d e f
   Levels: a b c d e f

So we could support a similar feature on Bimap objects thru the
'drop' argument, which is ignored at the moment:

   > Rlength(org.Hs.egSYMBOL[mykeys, drop=FALSE])
   [1] 43051
   > Rlength(org.Hs.egSYMBOL[mykeys, drop=TRUE])
   [1] 43051

However, an easy way to drop unused (i.e. unmapped) keys is:

   Lkeys(x) <- mappedLkeys(x)  # drop unused left keys
   Rkeys(x) <- mappedRkeys(x)  # drop unused right keys

or (as mentioned earlier):

   subset(x, Lkeys=mappedLkeys(x), Rkeys=mappedRkeys(x))

Cheers,
H.


> If you want to reduce both, the set of left keys and the set
> of right keys, consider using subset(). See ?`subset,Bimap-method`
> for the details.
>
> Hope this helps,
> H.
>
>
> On 10/21/2012 11:00 PM, Dario Strbenac wrote:
>> Hi,
>>
>> Why does Rkeys gives all of the gene symbols, not just the first 6 ?
>>
>>> head(names(geneTranscripts)) # Entrez IDs.
>> [1] "1"         "10"        "100"       "1000"      "10000"
>> "100008586"
>>
>>> length(org.Hs.egSYMBOL[head(names(geneTranscripts))])
>> [1] 6
>>
>>> length(Rkeys(org.Hs.egSYMBOL[head(names(geneTranscripts))]))
>> [1] 42075
>>
>> GenomicFeatures_1.10.0    AnnotationDbi_1.20.2
>>
>> --------------------------------------
>> Dario Strbenac
>> PhD Student
>> University of Sydney
>> Camperdown NSW 2050
>> Australia
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives:
>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list