[BioC] Using precede()/follow() to find two ranges

Cook, Malcolm MEC at stowers.org
Wed Aug 21 19:59:11 CEST 2013


Dolev,

Before chiming in on the problem as it is currently framed, I want to make sure of something.

The documentation for precede and follow read 'Overlapping ranges are excluded.'.  

For instance if one of your cg ranges were to overlap one of your gene ranges, any method based on precede/follow will not discover this association.

Is this really desirable for your application?

-Malcolm


 >-----Original Message-----
 >From: bioconductor-bounces at r-project.org [mailto:bioconductor-bounces at r-project.org] On Behalf Of d r
 >Sent: Wednesday, August 21, 2013 12:15 PM
 >To: Steve Lianoglou
 >Cc: bioconductor at r-project.org list
 >Subject: Re: [BioC] Using precede()/follow() to find two ranges
 >
 >Hi Steve and all
 >
 >Thanks for your suggestion. I was in fact thinking in this direction.
 >However, by completly discarding the first set of hits before querying the
 >second time I run the risk of missing ranges that may be potential second
 >hits beacuse they were discarded after being a first hit for other ranges.
 >For examle, if I have these two GRanges:
 >
 >gr1:
 >
 >GRanges with 2 ranges and 1 metadata column:
 >
 >       seqnames                 ranges strand |      names
 >
 >          <Rle>              <IRanges>  <Rle> |   <factor>
 >
 >   [1]     chr6 [125284212, 125284212]      * | cg00991794
 >
 >   [2]     chr6 [150465049, 150465049]      * | cg02250071
 >
 >
 >
 >
 >
 >gr2:
 >
 >GRanges with 4 ranges and 1 metadata column:
 >
 >       seqnames                 ranges strand |      gene
 >
 >          <Rle>              <IRanges>  <Rle> |  <factor>
 >
 >   [1]     chr6 [126284212, 124284212]      + |      PGM3
 >
 >   [2]     chr6 [160920998, 150920998]      + |   PLEKHG1
 >
 >   [3]     chr6 [ 83903012,  83903012]      + |      PGM3
 >
 >   [4]     chr6 [190102159, 170102159]       + |    WDR27
 >
 >The first hits will be gr2[1] and gr2[2], which will both be discarded.
 >
 >Now if I call precede() again the hits I will get will be gr2[3] and
 >gr[4], instead of getting again gr2[1] for gr1[2] and gr2[2] for
 >gr1[1].
 >
 >
 >I was thinking that applying a function that will do what Steve
 >suggested might do the trick if it can run on one range of gr1 at a
 >time without modifying gr2
 >
 >something along the lines of:
 >
 >two_hits_apply<-function(gr1,gr2)
 >{
 >
 >p1<-precede(gr1,gr2)
 >
 >gr2.less<-gr2[-p1]
 >
 >p2<-precede(gr1,gr2.less)
 >
 >hits<-c(p1,p2)
 >
 >hits
 >
 >}
 >
 >now all I need is a way to apply a functuon over two GRanegs object.
 >If I get it right, I will need somehting like mapply(), but that can
 >actuaaly work on GRanges.
 >
 >Is such a function exists, or alternativly, is there a way to do this
 >with the convential apply functions that I miss?
 >
 >Many thanks in advance
 >Dolev
 >
 >
 >
 >
 >
 >
 >
 >On Wed, Aug 21, 2013 at 7:10 PM, Steve Lianoglou
 ><lianoglou.steve at gene.com>wrote:
 >
 >> Hi,
 >>
 >> On Wed, Aug 21, 2013 at 7:25 AM, d r <dolevrahat at gmail.com> wrote:
 >> > Hello
 >> >
 >> > I am looking for a way to find the two preceiding/folliwng ranges in one
 >> > GRanges object to each range in a second GRanges object.
 >> >
 >> > In other words, I am looking for a variation on precede() or follow()
 >> that
 >> > will return for each range in x two ranges in subject instead of one: the
 >> > nearest preceidng(following) range and the second nearest preceding
 >> > (following) range.
 >> >
 >> > Is there a way to do this? (sorry for not giving any more constructive
 >> > suggestions, I know relativily  little on how GRanges works any therefore
 >> > am at a loss as to how to proceed)
 >>
 >> A first draft way of doing that would be to simply use the results
 >> from the first precede to remove those elements from the second call?
 >>
 >> For instance, if you have two ranges you are querying, gr1 and gr2:
 >>
 >> To get the immediately preceding ranges:
 >>
 >> R> p1 <- precede(gr1, gr2)
 >>
 >> Then to get the ones immediately after that:
 >>
 >> R> gr2.less <- gr2[-p1]
 >> R> p2 <- precede(gr1, gr2.less)
 >>
 >> Then you can see who is who with `gr2[p1]` and the who-who is who with
 >> `gr2.less[p2]` ... that should get you pretty close -- will likely
 >> have to handle edge cases, for instance when there are no preceding
 >> ranges, I think you get an NA (if I recall) so think about what you
 >> want to do with those.
 >>
 >> -steve
 >>
 >> --
 >> Steve Lianoglou
 >> Computational Biologist
 >> Bioinformatics and Computational Biology
 >> Genentech
 >>
 >
 >	[[alternative HTML version deleted]]
 >
 >_______________________________________________
 >Bioconductor mailing list
 >Bioconductor at r-project.org
 >https://stat.ethz.ch/mailman/listinfo/bioconductor
 >Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list