[BioC] find overlap of bed files of different length

Thomas Girke thomas.girke at ucr.edu
Tue Feb 8 17:59:09 CET 2011


What about a more generic solution to this very reasonable utility
request by returning a somewhat complete overlap mapping result in a
GRanges object that contains overlap start/end positions,
percent/absolute overlap, overlap types, etc. The relative overlap
filtering by percent values can then be performed very easily in a
second step, like in this example:

## Two sample GRanges objects
library(GenomicRanges)
grq <- GRanges(seqnames = Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)), 
               ranges = IRanges(seq(1, 100, by=10), end = seq(30, 120, by=10)), 
               strand = Rle(strand(c("-", "+", "-")), c(1, 7, 2)))
grs <- shift(grq[c(2,5,6)], 5)

## Return absolute and relative overlap positions with olRanges function
source("http://faculty.ucr.edu/~tgirke/Documents/R_BioCond/My_R_Scripts/rangeoverlapper.R")
myol <- olRanges(query=grq, subject=grs, output="gr") # output="df" returns data frame
myol 

## Return overlaps that cover at least 50% of the query/subject ranges.
myol[elementMetadata(myol)[, "OLpercQ"] > 50] 
myol[elementMetadata(myol)[, "OLpercS"] > 50]


This works with both GRanges or IRanges objects as input. Wouldn't something like this 
be a nice "convenience output" argument for the findOverlap function???

Thomas

On Tue, Feb 01, 2011 at 06:41:19PM -0800, Michael Lawrence wrote:
> 2011/2/1 Herv? Pag?s <hpages at fhcrc.org>
> 
> > Hi,
> >
> >
> > On 02/01/2011 11:08 AM, Michael Lawrence wrote:
> > [...]
> >
> >  This is a reasonable request. As Kasper mentioned, it's possible with post
> >> processing.
> >>
> >> E.g.:
> >>
> >> m<- findOverlaps(query, subject)
> >> percentOverlap<- width(ranges(m, query, subject)) /
> >> width(query)[queryHits(m)]
> >> keep<- percentOverlap>  cutoff
> >>
> >> Perhaps someone up North could add this to IRanges/GenomicRanges?
> >>
> >
> > Would be my pleasure. Yes the functionality provided by this
> > ranges,RangesMatching method is very useful but maybe we should try
> > to give it a little bit more exposure. Right now it's kind of "hidden"
> > in the man page for RangesMatching (which you will get with something
> > like ?`RangesMatching-class` or ?`ranges,RangesMatching-method`) and
> > not mentioned in any vignette AFAIK. Also it works only if query and
> > subject are both Ranges objects but not with GRanges objects.
> >
> > So I propose to:
> >
> >  1. Rename this to something like overlaps(), or overlappingRanges(),
> >     or ... (fill the blank), and deprecate the ranges,RangesMatching
> >     method.
> >     overlaps() would be a new generic with the m,query,subject
> >     signature.
> >     The current name is probably part of the reason why nobody (except
> >     the original author of the method) knew about it.
> >     I personally find it unexpected that a quite non-specific name
> >     like "ranges" is used for this when it generally plays the role
> >     of an accessor to get/set the ranges of containers like IRanges,
> >     RangedData, GRanges, etc...
> >
> >  2. Add a "overlaps" method for
> >     RangesMatching,GenomicRanges,GenomicRanges.
> >
> >  3. Illustrate how to use this in the vignettes of GenomicRanges.
> >
> > Would that be OK?
> >
> >
> That would be great, but I was really referring to supporting a fractional
> minOverlaps parameter to findOverlaps, etc. Sorry that I wasn't clear.
> 
> 
> > H.
> >
> >
> >> Michael
> >>
> >> D.
> >>
> >>>
> >>>
> >>        [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioconductor mailing list
> >> Bioconductor at r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >> Search the archives:
> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>
> >
> >
> > --
> > Herv? Pag?s
> >
> > Program in Computational Biology
> > Division of Public Health Sciences
> >
> > Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N, M2-B876
> > P.O. Box 19024
> > Seattle, WA 98109-1024
> >
> > E-mail: hpages at fhcrc.org
> > Phone:  (206) 667-5791
> > Fax:    (206) 667-1319
> >
> 
> 	[[alternative HTML version deleted]]
> 

> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list