[BioC] shift function (GenomicRanges, IRanges), issue with negative start values and circular DNA

Ivanek, Robert robert.ivanek at fmi.ch
Thu Sep 1 14:27:24 CEST 2011


Dear Bioc developers,

I am analysing the ChIP-Seq data using the GenomicRanges package. I am 
usually shifting the reads to the middle of the fragment (i.e. by 50 bp 
toward 3' end of the fragment according the strand of the read), using 
the shift function (from IRanges), which works in most cases.

shift(gr, shift=50 * as.integer(ifelse(strand(gr)=="-", -1, 1)))

However sometimes the result does not look like as I would expect:

1. If the read is on the minus strand and the start value is smaller 
than the shift value, it produces an error:

## Error in validObject(x) :
##   invalid class “IRanges” object: 'widths(x)' cannot contain negative 
values

If the read is on the plus strand at the end of the chromosome it 
returns the size of the chromosome as a new end value. Would be possible 
to apply similar strategy for the beginning of the chromosome (It would 
return 1 as a start value)?



2. The shift function does not take into account information about 
circularity of the chromosome. Is there any plan to include it?


Please see below reproducible examples

Best Regards

Robert

-- 
Robert Ivanek
Postdoctoral Fellow Schuebeler Group
Friedrich Miescher Institute
Maulbeerstrasse 66
4058 Basel / Switzerland
Office phone: +41 61 697 6100




EXAMPLES:


## 
#############################################################################
##
library("GenomicRanges")
## Loading required package: IRanges
## Attaching package: ‘IRanges’
## The following object(s) are masked from ‘package:base’:
##     cbind, eval, intersect, Map, mapply, order, paste, pmax, 
pmax.int, pmin, pmin.int, rbind, rep.int, setdiff, table, union

##
## 
#############################################################################
##
chr1.gr <- GRanges(seqnames=Rle("chr1", 4),
               ranges=IRanges(start=rep(c(1,197195432),each=2), width=1),
               strand=rep(c("+","-"),2),
               seqlengths=c(chr1=197195432))
##

isCircular(chr1.gr) <- FALSE
##

chr1.gr
## GRanges with 4 ranges and 0 elementMetadata values:
##       seqnames                 ranges strand
##          <Rle>              <IRanges>  <Rle>
##   [1]     chr1 [        1,         1]      +
##   [2]     chr1 [        1,         1]      -
##   [3]     chr1 [197195432, 197195432]      +
##   [4]     chr1 [197195432, 197195432]      -
##   ---
##   seqlengths:
##         chr1
##    197195432
##

seqinfo(chr1.gr)
## Seqinfo of length 1
## seqnames seqlengths isCircular
## chr1      197195432      FALSE
##

shift(chr1.gr, shift=50)
## GRanges with 4 ranges and 0 elementMetadata values:
##       seqnames                 ranges strand
##          <Rle>              <IRanges>  <Rle>
##   [1]     chr1 [       51,        51]      +
##   [2]     chr1 [       51,        51]      -
##   [3]     chr1 [197195432, 197195432]      +
##   [4]     chr1 [197195432, 197195432]      -
##   ---
##   seqlengths:
##         chr1
##    197195432
## Warning message:
## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, 197195482L,  :
##   trimmed end values to be <= seqlengths
##

shift(chr1.gr, shift=50 * as.integer(ifelse(strand(chr1.gr)=="-", -1, 1)))
## Error in validObject(x) :
##   invalid class “IRanges” object: 'widths(x)' cannot contain negative 
values
## In addition: Warning messages:
## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, 197195482L,  :
##   trimmed end values to be <= seqlengths
## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 197195432L, 197195382L :
##   trimmed start values to be positive
##
## 
#############################################################################
##
chrM.gr <- GRanges(seqnames=Rle("chrM", 4),
               ranges=IRanges(start=rep(c(1,16299),each=2), width=1),
               strand=rep(c("+","-"),2),
               seqlengths=c(chrM=16299))
##

isCircular(chrM.gr) <- TRUE
##

chrM.gr
## GRanges with 4 ranges and 0 elementMetadata values:
##       seqnames         ranges strand
##          <Rle>      <IRanges>  <Rle>
##   [1]     chrM [    1,     1]      +
##   [2]     chrM [    1,     1]      -
##   [3]     chrM [16299, 16299]      +
##   [4]     chrM [16299, 16299]      -
##   ---
##   seqlengths:
##     chrM
##    16299
##

seqinfo(chrM.gr)
## Seqinfo of length 1
## seqnames seqlengths isCircular
## chrM          16299       TRUE
##

shift(chrM.gr, shift=50)
## GRanges with 4 ranges and 0 elementMetadata values:
##       seqnames         ranges strand
##          <Rle>      <IRanges>  <Rle>
##   [1]     chrM [   51,    51]      +
##   [2]     chrM [   51,    51]      -
##   [3]     chrM [16299, 16299]      +
##   [4]     chrM [16299, 16299]      -
##   ---
##   seqlengths:
##     chrM
##    16299
## Warning message:
## In `end<-`(`*tmp*`, check = FALSE, value = c(51L, 51L, 16349L, 16349L :
##   trimmed end values to be <= seqlengths
##

shift(chrM.gr, shift=50 * as.integer(ifelse(strand(chrM.gr)=="-", -1, 1)))
## Error in validObject(x) :
##   invalid class “IRanges” object: 'widths(x)' cannot contain negative 
values
## In addition: Warning messages:
## 1: In `end<-`(`*tmp*`, check = FALSE, value = c(51L, -49L, 16349L,  :
##   trimmed end values to be <= seqlengths
## 2: In `start<-`(`*tmp*`, value = c(51L, -49L, 16299L, 16249L)) :
##   trimmed start values to be positive
##
##
## 
#############################################################################
##
sessionInfo()
## R Under development (unstable) (2011-08-29 r56824)
## Platform: x86_64-unknown-linux-gnu (64-bit)

## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C 
LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8 
LC_MONETARY=en_US.UTF-8
##  [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=C                 LC_NAME=C 
                  LC_ADDRESS=C               LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base

## other attached packages:
## [1] GenomicRanges_1.5.31 IRanges_1.11.26
##
## 
#############################################################################


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: shift.output.R
URL: <https://stat.ethz.ch/pipermail/bioconductor/attachments/20110901/764b0963/attachment.pl>


More information about the Bioconductor mailing list