[BioC] distanceToNearest in GenomicRanges

James W. MacDonald jmacdon at uw.edu
Mon Feb 11 19:08:11 CET 2013


Hi Tom,

On 2/11/2013 11:35 AM, Tom Oates wrote:
> Hi
> I am very much a learner in R in general&  GenomicRanges in general
> I am struggling to find documentation to help me get my head around
> distanceToNearest in GenomicRanges
> If I have a GRanges object:
>
> GRanges with 6 ranges and 4 metadata columns:
>        seqnames                 ranges strand |
>           <Rle>               <IRanges>   <Rle>  |
>    [1]       10 [ 96723746,  96723747]      - |
>    [2]        7 [ 13641170,  13641171]      + |
>    [3]       16 [ 17772801,  17772802]      - |
>    [4]        3 [ 88173502,  88173503]      - |
>    [5]       13 [106979682, 106979683]      + |
>    [6]        9 [104393139, 104393140]      + |
>
> (You will notice that all the regions are only dinucleotides&  I have
> removed the metadata )
>
> I have a 2nd GRanges object which is ensembl rat transcripts as below:
> 39549 ranges and 2 metadata columns:
>            seqnames                 ranges strand   |     tx_id
> tx_name
>               <Rle>               <IRanges>   <Rle>    |<integer>
> <character>
>        [1]        1          [5473, 16844]      +   |         1
> ENSRNOT00000044270
>        [2]        1          [5526, 16968]      +   |         2
> ENSRNOT00000049921
>        [3]        1          [5526, 16968]      +   |         3
> ENSRNOT00000051735
>        [4]        1          [5598, 13520]      +   |         4
> ENSRNOT00000034630
>        [5]        1          [8268, 16850]      +   |         5
> ENSRNOT00000044505
>        [6]        1          [8316, 17577]      +   |         6
> ENSRNOT00000042693
>        [7]        1          [8884, 16850]      +   |         7
> ENSRNOT00000044187
>        [8]        1          [8956,  9955]      +   |         8
> ENSRNOT00000041082
>        [9]        1          [9055, 17351]      +   |         9
> ENSRNOT00000050254
>
>
> If I invoke:
> xx<-distanceToNearest(diff.cpgs.gr, rat.transcripts, ignore.strand=F)
>
> xx
> DataFrame with 1133 rows and 3 columns
>       queryHits subjectHits  distance
>       <integer>    <integer>  <integer>
> 1            1        7752         0
> 2            2       32166     11946
> 3            3       14678     25377
> 4            4       24286     66747
> 5            5       10609     34242
> 6            6       37076    122683
> 7            7       35184         0
> 8            8       34180     45561
> 9            9       19351     50156
> ...        ...         ...       ...
> etc
>
> I am uncertain how I would then use the xx output to gain information (i.e.
> tx_id, tx_name) about the feature which the function has identified as
> nearest?
> I would be happy to supply any more info as required

The subjectHits column gives the row of your transcript GRanges object 
that matches the corresponding query row. I am assuming here that the 
'diff.cpgs.gr' GRanges object is longer than 6? Anyway, here is an 
example using your data and the TxDb.Mmusculus.UCSC.mm10.knownGene package:

 > x
GRanges with 6 ranges and 0 metadata columns:
       seqnames                 ranges strand
<Rle> <IRanges> <Rle>
   [1]    chr10 [ 96723746,  96723747]      *
   [2]     chr7 [ 13641170,  13641171]      *
   [3]    chr16 [ 17772801,  17772802]      *
   [4]     chr3 [ 88173502,  88173503]      *
   [5]    chr13 [106979682, 106979683]      *
   [6]     chr9 [104393139, 104393140]      *
   ---
 > y <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
 > xx <- distanceToNearest(x, y, ignore.strand=F)
 > xx
DataFrame with 6 rows and 3 columns
   queryHits subjectHits  distance
<integer> <integer> <integer>
1         1        4514    100935
2         2       45653         0
3         3       19383         0
4         4       34197         0
5         5       14383         0
6         6       54212      8108


 > y[xx[,2],]
GRanges with 6 ranges and 2 metadata columns:
       seqnames                 ranges strand |     tx_id     tx_name
<Rle> <IRanges> <Rle> | <integer> <character>
   [1]    chr10 [ 96617001,  96622811]      + |     33419  uc007gww.2
   [2]     chr7 [ 13623967,  13670807]      + |     21400  uc012ezp.1
   [3]    chr16 [ 17759663,  17779206]      + |     48288  uc007ylz.1
   [4]     chr3 [ 88171560,  88177785]      - |     10107  uc008puf.2
   [5]    chr13 [106963757, 107022114]      - |     43288  uc007rue.1
   [6]     chr9 [104361832, 104385031]      + |     29956  uc009rhp.1
   ---
   seqlengths:
                    chr1                 chr2 ...       chrUn_JH584304
               195471971            182113224 ...               114452

Best,

Jim


> Tom
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099



More information about the Bioconductor mailing list