[BioC] suppressing reduce() function when applying set operations to GRanges

Hervé Pagès hpages at fhcrc.org
Mon Feb 11 06:17:04 CET 2013


Hi Bill,

On 02/10/2013 03:00 PM, Bill Gibb wrote:
> Hello,
>
> I noticed that when applying set operations to GRanges objects, the returned range is reduced (by default), e.g.
>
> 	r1 <- GRanges(seqnames=c(1,1,1), ranges=IRanges(start=c(1,3,5), end=c(1,3,5)), strand='*')
> 	length(r1)
> 	r2 <- GRanges(seqnames=c(1,1,1), ranges=IRanges(start=c(1,3,4), end=c(1,3,4)), strand='*')
> 	length(r2)
> 	r3 <- union(r1,r2)
> 	length(r3)
> 	sum(width(ranges(r3)))
>
> Both r1 and r2 have length 3, whereas r3 has length 2, due to the implicit reduce applied to the result of union(). I can see that reducing the ranges would normally be desired when applying set operations, however there are occasions when one might want to keep a list of singleton ranges (e.g. when sub-sampling at the genomic coordinate level). Is there a way to suppress the reduce() operation when applying set operations to GRanges?

You could do:

   > unique(c(r1, r2))
   GRanges with 4 ranges and 0 metadata columns:
       seqnames    ranges strand
            <Rle> <IRanges>  <Rle>
     [1]        1    [1, 1]      *
     [2]        1    [3, 3]      *
     [3]        1    [5, 5]      *
     [4]        1    [4, 4]      *
     ---
     seqlengths:
       1
      NA

which is another form of union() that is certainly more in the spirit
of what base::union() does on ordinary vectors.

However union(x, y) on GRanges objects goes one step further by reducing
the above result. Yes we could add the 'reduce.ranges' arg to control
this, considering that it's not the 1st time that users seem to be
confused by this.

Cheers,
H.


> Something like union(r1, r2, reduce.ranges=FALSE) would be nice.
>
> I also tried:
>
> 	resize(r3,width=1)
>
> however it appears to simply truncate multi-base sequences.
>
> Thank you.
>
> Bill Gibb
> Genomic Health, Inc.
> Redwood City, CA
>
> ______________________________________________________________________
> The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list