[BioC] Exclude overlapping intervals

Hervé Pagès hpages at fhcrc.org
Thu Dec 13 19:02:31 CET 2012


Hi Hermann,

On 12/13/2012 09:33 AM, Hermann Norpois wrote:
> As both methods did not work ...
>
> 1) > gr[!gr %in% excluded] gr[!gr %in% excluded]
> Fehler in gr[!gr %in% excluded] :
>    Fehler bei der Auswertung des Argumentes 'i' bei der Methodenauswahl
> für Funktion '[': Fehler in gr %in% excluded :
>    Fehler bei der Auswertung des Argumentes 'table' bei der Methodenauswahl
> für Funktion '%in%': Fehler: Objekt 'excluded' nicht gefunden

My limited knowledge of Goethe's tongue tells me that the 'excluded'
object was not found. Of course you need to provide a GRanges object,
call it 'excluded' or 'gr2', that contains the ranges that you want to
remove from 'gr' (based on overlaps).

Using 'gr' itself in place of 'gr2' to remove ranges in 'gr' that
overlap with other ranges in 'gr' would not produce what you want
though, because that would exclude everything:

   > gr[!gr %in% gr]
   GRanges with 0 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
     ---
     seqlengths:
      chr1 chr2 chr3
      1000 2000 1500

Try this instead:

   gr[countOverlaps(gr, gr) <= 1L]

Cheers,
H.

>
>> setdiff (gr)
> Fehler in as.vector(x) :
>    Keine Methode um diese S4 Klasse in einen Vektor zu verwandeln
>
> ... I am posting my data:
>
>> gr
> GRanges with 5 ranges and 0 metadata columns:
>        seqnames     ranges strand
>           <Rle>  <IRanges>  <Rle>
>    [1]     chr1 [  4,  10]      *
>    [2]     chr1 [ 12,  19]      *
>    [3]     chr1 [ 45,  80]      *
>    [4]     chr1 [ 55, 100]      *
>    [5]     chr1 [105, 200]      *
>    ---
>    seqlengths:
>     chr1
>       NA
>> dput (gr)
> new("GRanges"
>      , seqnames = new("Rle"
>      , values = structure(1L, .Label = "chr1", class = "factor")
>      , lengths = 5L
>      , elementMetadata = NULL
>      , metadata = list()
> )
>      , ranges = new("IRanges"
>      , start = c(4L, 12L, 45L, 55L, 105L)
>      , width = c(7L, 8L, 36L, 46L, 96L)
>      , NAMES = NULL
>      , elementType = "integer"
>      , elementMetadata = NULL
>      , metadata = list()
> )
>      , strand = new("Rle"
>      , values = structure(3L, .Label = c("+", "-", "*"), class = "factor")
>      , lengths = 5L
>      , elementMetadata = NULL
>      , metadata = list()
> )
>      , elementMetadata = new("DataFrame"
>      , rownames = NULL
>      , nrows = 5L
>      , listData = structure(list(), .Names = character(0))
>      , elementType = "ANY"
>      , elementMetadata = NULL
>      , metadata = list()
> )
>      , seqinfo = new("Seqinfo"
>      , seqnames = "chr1"
>      , seqlengths = NA_integer_
>      , is_circular = NA
>      , genome = NA_character_
> )
>      , metadata = list()
> )
>
> Thanks
> Hermann
>
> 2012/12/13 Michael Lawrence <lawrence.michael at gene.com>
>
>> Something like:
>>
>> gr[!gr %in% excluded]
>>
>> Another option is setdiff() but note that will generate a new set of
>> sorted, non-overlapping, non-adjacent ("normal") ranges.
>>
>> Michael
>>
>>
>> On Thu, Dec 13, 2012 at 9:01 AM, Hermann Norpois <hnorpois at googlemail.com>wrote:
>>
>>> Hello,
>>>
>>> having a GRanges object I was looking for a function to exclude all
>>> overlapping intervals. So I get exclusively intervals that do not overlap.
>>> But I did not find a proper function. Has anybody an idea?
>>> Thanks
>>> Hermann
>>>
>>>          [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>
> 	[[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list