[BioC] shift function (GenomicRanges, IRanges), issue with negative start values and circular DNA

Ivanek, Robert robert.ivanek at fmi.ch
Mon Jan 23 17:13:25 CET 2012


Hi Hervé,

On 01/23/2012 01:04 AM, Hervé Pagès wrote:
> Hi Robert,
>
> Sorry that this completely went off my radar but this week someone
> kindly sent me a reminder.
>
> So after re-reading your email I agree that shift() should behave
> consistently on both ends of the chromosome but I don't like the idea
> that it automatically trims the ranges for you. I think we should
> shift without trimming, but with a warning that some ranges are going
> off limits.

This sounds reasonable. User can decide what to do with them.


> Then it would be very easy for the user to trim those ranges if
> s/he wanted, using trim() (there is currently no "trim" method for
> GRanges objects but I'll add one). Note that for ranges that are
> completely off limits (e.g. start=-20, end=-4), it's not clear
> what trim() should do: (a) fail, (b) produce a 0-width range located
> on the edge (e.g. start=1, end=0), with or (c) without warning.

I think, that the GRanges cannot have negative widths, so I would 
suggest that they would be trimmed to (start=1,end=1) or 
(start=seqlength, end=seqlength) by default.

> This is one of the reasons I think this task is better done separately.
> The other reason to not do the trimming in shift() is to maintain
> the nice property that shift(shift(x, N), -N) should always be a no-op.
>
> When the chromosome is circular, I think we should do the same, except
> that no warning should be issued when a range goes off limits.

Does the GRanges class allow the negative width on circular chromosomes?

Cheers

Robert


>
> Does that sound reasonable?
>
> Cheers,
> H.
>
>
> On 09/01/2011 08:23 AM, Ivanek, Robert wrote:
>> Hi Michael,
>>
>> is there any plan to fixed these issues?
>>
>> Best Regards
>>
>> Robert
>>
>> On 09/01/2011 03:37 PM, Michael Lawrence wrote:
>>> Nevermind, I misunderstood what you were trying to do.
>>>
>>> On Thu, Sep 1, 2011 at 6:36 AM, Michael Lawrence <michafla at gene.com
>>> <mailto:michafla at gene.com>> wrote:
>>>
>>> As a sort of side-comment, you might be better off doing this with
>>> resize(), i.e.,
>>>
>>> resize(gr, width(gr) - 50, fix = "end")
>>>
>>> Michael
>>>
>>> On Thu, Sep 1, 2011 at 5:27 AM, Ivanek, Robert <robert.ivanek at fmi.ch
>>> <mailto:robert.ivanek at fmi.ch>> wrote:
>>>
>>> Dear Bioc developers,
>>>
>>> I am analysing the ChIP-Seq data using the GenomicRanges
>>> package. I am usually shifting the reads to the middle of the
>>> fragment (i.e. by 50 bp toward 3' end of the fragment according
>>> the strand of the read), using the shift function (from
>>> IRanges), which works in most cases.
>>>
>>> shift(gr, shift=50 * as.integer(ifelse(strand(gr)==__"-", -1, 1)))
>>>
>>> However sometimes the result does not look like as I would expect:
>>>
>>> 1. If the read is on the minus strand and the start value is
>>> smaller than the shift value, it produces an error:
>>>
>>> ## Error in validObject(x) :
>>> ## invalid class “IRanges” object: 'widths(x)' cannot contain
>>> negative values
>>>
>>> If the read is on the plus strand at the end of the chromosome
>>> it returns the size of the chromosome as a new end value. Would
>>> be possible to apply similar strategy for the beginning of the
>>> chromosome (It would return 1 as a start value)?
>>>
>>>
>>>
>>> 2. The shift function does not take into account information
>>> about circularity of the chromosome. Is there any plan to
>>> include it?
>>>
>>>
>>> Please see below reproducible examples
>>>
>>> Best Regards
>>>
>>> Robert
>>>
>>> --
>>> Robert Ivanek
>>> Postdoctoral Fellow Schuebeler Group
>>> Friedrich Miescher Institute
>>> Maulbeerstrasse 66
>>> 4058 Basel / Switzerland
>>> Office phone: +41 61 697 6100 <tel:%2B41%2061%20697%206100>
>>>
>>>
>>>
>>>
>>> EXAMPLES:
>>>
>>>
>>> ##
>>> ##############################__##############################__#################
>>>
>>>
>>> ##
>>> library("GenomicRanges")
>>> ## Loading required package: IRanges
>>> ## Attaching package: ‘IRanges’
>>> ## The following object(s) are masked from ‘package:base’:
>>> ## cbind, eval, intersect, Map, mapply, order, paste, pmax,
>>> pmax.int <http://pmax.int>, pmin, pmin.int <http://pmin.int>,
>>> rbind, rep.int <http://rep.int>, setdiff, table, union
>>>
>>> ##
>>> ##
>>> ##############################__##############################__#################
>>>
>>>
>>> ##
>>> chr1.gr <http://chr1.gr> <- GRanges(seqnames=Rle("chr1", 4),
>>> ranges=IRanges(start=rep(c(1,__197195432),each=2), width=1),
>>> strand=rep(c("+","-"),2),
>>> seqlengths=c(chr1=197195432))
>>> ##
>>>
>>> isCircular(chr1.gr <http://chr1.gr>) <- FALSE
>>> ##
>>>
>>> chr1.gr <http://chr1.gr>
>>> ## GRanges with 4 ranges and 0 elementMetadata values:
>>> ## seqnames ranges strand
>>> ## <Rle> <IRanges> <Rle>
>>> ## [1] chr1 [ 1, 1] +
>>> ## [2] chr1 [ 1, 1] -
>>> ## [3] chr1 [197195432, 197195432] +
>>> ## [4] chr1 [197195432, 197195432] -
>>> ## ---
>>> ## seqlengths:
>>> ## chr1
>>> ## 197195432
>>> ##
>>>
>>> seqinfo(chr1.gr <http://chr1.gr>)
>>> ## Seqinfo of length 1
>>> ## seqnames seqlengths isCircular
>>> ## chr1 197195432 FALSE
>>> ##
>>>
>>> shift(chr1.gr <http://chr1.gr>, shift=50)
>>> ## GRanges with 4 ranges and 0 elementMetadata values:
>>> ## seqnames ranges strand
>>> ## <Rle> <IRanges> <Rle>
>>> ## [1] chr1 [ 51, 51] +
>>> ## [2] chr1 [ 51, 51] -
>>> ## [3] chr1 [197195432, 197195432] +
>>> ## [4] chr1 [197195432, 197195432] -
>>> ## ---
>>> ## seqlengths:
>>> ## chr1
>>> ## 197195432
>>> ## Warning message:
>>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
>>> 197195482L, :
>>> ## trimmed end values to be <= seqlengths
>>> ##
>>>
>>> shift(chr1.gr <http://chr1.gr>, shift=50 *
>>> as.integer(ifelse(strand(chr1.__gr <http://chr1.gr>)=="-", -1, 1)))
>>> ## Error in validObject(x) :
>>> ## invalid class “IRanges” object: 'widths(x)' cannot contain
>>> negative values
>>> ## In addition: Warning messages:
>>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
>>> 197195482L, :
>>> ## trimmed end values to be <= seqlengths
>>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L,
>>> 197195382L :
>>> ## trimmed start values to be positive
>>> ##
>>> ##
>>> ##############################__##############################__#################
>>>
>>>
>>> ##
>>> chrM.gr <- GRanges(seqnames=Rle("chrM", 4),
>>> ranges=IRanges(start=rep(c(1,__16299),each=2), width=1),
>>> strand=rep(c("+","-"),2),
>>> seqlengths=c(chrM=16299))
>>> ##
>>>
>>> isCircular(chrM.gr) <- TRUE
>>> ##
>>>
>>> chrM.gr
>>> ## GRanges with 4 ranges and 0 elementMetadata values:
>>> ## seqnames ranges strand
>>> ## <Rle> <IRanges> <Rle>
>>> ## [1] chrM [ 1, 1] +
>>> ## [2] chrM [ 1, 1] -
>>> ## [3] chrM [16299, 16299] +
>>> ## [4] chrM [16299, 16299] -
>>> ## ---
>>> ## seqlengths:
>>> ## chrM
>>> ## 16299
>>> ##
>>>
>>> seqinfo(chrM.gr)
>>> ## Seqinfo of length 1
>>> ## seqnames seqlengths isCircular
>>> ## chrM 16299 TRUE
>>> ##
>>>
>>> shift(chrM.gr, shift=50)
>>> ## GRanges with 4 ranges and 0 elementMetadata values:
>>> ## seqnames ranges strand
>>> ## <Rle> <IRanges> <Rle>
>>> ## [1] chrM [ 51, 51] +
>>> ## [2] chrM [ 51, 51] -
>>> ## [3] chrM [16299, 16299] +
>>> ## [4] chrM [16299, 16299] -
>>> ## ---
>>> ## seqlengths:
>>> ## chrM
>>> ## 16299
>>> ## Warning message:
>>> ## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L,
>>> 16349L, 16349L :
>>> ## trimmed end values to be <= seqlengths
>>> ##
>>>
>>> shift(chrM.gr, shift=50 *
>>> as.integer(ifelse(strand(chrM.__gr)=="-", -1, 1)))
>>> ## Error in validObject(x) :
>>> ## invalid class “IRanges” object: 'widths(x)' cannot contain
>>> negative values
>>> ## In addition: Warning messages:
>>> ## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L,
>>> 16349L, :
>>> ## trimmed end values to be <= seqlengths
>>> ## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) :
>>> ## trimmed start values to be positive
>>> ##
>>> ##
>>> ##
>>> ##############################__##############################__#################
>>>
>>>
>>> ##
>>> sessionInfo()
>>> ## R Under development (unstable) (2011-08-29 r56824)
>>> ## Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> ## locale:
>>> ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
>>> LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
>>> ## [6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=C LC_NAME=C LC_ADDRESS=C
>>> LC_TELEPHONE=C
>>> ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> ## attached base packages:
>>> ## [1] stats graphics grDevices utils datasets methods base
>>>
>>> ## other attached packages:
>>> ## [1] GenomicRanges_1.5.31 IRanges_1.11.26
>>> ##
>>> ##
>>> ##############################__##############################__#################
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>>
>>>
>>
>
>

-- 
Robert Ivanek
Postdoctoral Fellow Schuebeler Group
Friedrich Miescher Institute
Maulbeerstrasse 66
4058 Basel / Switzerland
Office phone: +41 61 697 6100



More information about the Bioconductor mailing list