[BioC] rtracklayer: small problem

Gustavo Fernández Bayón gbayon at gmail.com
Wed Jan 16 12:47:23 CET 2013


Hi everybody.

I have managed to spot some strange (at least from a newbie point of 
view) behaviour in the rtracklayer package. I have set up a small 
example for this:

library(rtracklayer)
s <- browserSession()
genome(s) <- 'hg19'
track <- 'wgEncodeBroadHistone'
table.name <- 'wgEncodeBroadHistoneGm12878CtcfStdPk'
q <- ucscTableQuery(s, track=track, table=table.name)

ex1 <- getTable(q)
ex2 <- track(q)
ex3 <- track(q, asRangedData=FALSE)

Then, I show the contents for the first element of the three result 
datasets (data.frame, RangedData and GRanges, respectively):

 > ex1[1,]
   bin chrom chromStart  chromEnd name score strand signalValue pValue 
qValue
1   3  chr1  150941733 151007265    .   297      .     2.98199 13     -1
 > ex2[1,]
UCSC track 'wgEncodeBroadHistoneGm12878CtcfStdPk'
UCSCData with 1 row and 3 value columns across 93 spaces
      space                 ranges |        name     score   strand
   <factor>              <IRanges> | <character> <numeric> <factor>
1     chr1 [150941734, 151007265] |          NA       297        *
 > ex3[1]
GRanges with 1 range and 2 metadata columns:
       seqnames                 ranges strand |        name     score
          <Rle>              <IRanges>  <Rle> | <character> <numeric>
   [1]     chr1 [150941734, 151007265]      * | <NA>       297
   ---
   seqlengths:
                     chr1                  chr2 ... chrUn_gl000249
                249250621             243199373 ... 38502

I have noticed that the starting position of the range is one base 
higher in the ranges-based objects than in the original table. Don't 
know if this is an error inside the track function() or something I am 
missing. This mistake occurs for every element, not only for the first one.

 > all(start(ex3) == ex1$chromStart + 1)
[1] TRUE

My sessionInfo:

 > sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8
  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods base

other attached packages:
[1] rtracklayer_1.18.2   GenomicRanges_1.10.5 IRanges_1.16.4
[4] BiocGenerics_0.4.0

loaded via a namespace (and not attached):
  [1] Biostrings_2.26.2 bitops_1.0-5
  [3] BSgenome_1.26.1 BSgenome.Hsapiens.UCSC.hg19_1.3.19
  [5] parallel_2.15.2 RCurl_1.95-3
  [7] Rsamtools_1.10.2 stats4_2.15.2
  [9] tcltk_2.15.2 tools_2.15.2
[11] XML_3.95-0.1                       zlibbioc_1.4.0

Any hint will be much appreciated. It's not a big problem, but quite 
interesting.

Regards,
Gus



More information about the Bioconductor mailing list