[BioC] readGappedAlignments and param argument

Elena Grassi grassi.e at gmail.com
Mon Sep 9 12:57:12 CEST 2013


Hello,

I've stumbled across a strange behaviour when indexing bams and then
reading them with ReadGappedAlignments with a param
holding a which section.
Given this bam:
data at tungsteno:/rogue_bis/data/bioinfotree/prj/roar/dataset/0.2/roar_test_2$
samtools view test.bam
D1JP1ACXX130107:2:2201:7003:63880       16      chr1    243675625
 255     92M     *       0       0       *       *

> library(GenomicRanges)
> bam <- "/scarlet/data/bioinfotree/prj/roar/dataset/0.2/roar_test_2/test.bam"
> ga <- readGappedAlignments(bam)
> ga
GappedAlignments with 1 alignment and 0 metadata columns:
      seqnames strand       cigar    qwidth     start       end
width      ngap
         <Rle>  <Rle> <character> <integer> <integer> <integer>
<integer> <integer>
  [1]     chr1      -         92M        92 243675625 243675716
92         0
  ---
  seqlengths:
                    chr1                  chr2 ...
chrUn_gl000226 chr18_gl000207_random
               249250621             243199373 ...
15008                  4262

Everything is fine here. But if index it and load it with a param:
> gene_id <- c("A", "B")
> features <- GRanges(
+     seqnames = Rle(c("chr1", "chr1")),
+     strand = strand(c("-", "-")),
+     ranges = IRanges(
+         start=c(243675625, 243675627),
+         width=c(2, 100)),
+     DataFrame(gene_id)
+ )
> indexBam(bam)
      /tmp/RtmpVbmBLu/filead32d37188.bam
"/tmp/RtmpVbmBLu/filead32d37188.bam.bai"
> ga_s_p <- readGappedAlignments(bam, param=ScanBamParam(which=features))
> ga_s_p
GappedAlignments with 2 alignments and 0 metadata columns:
      seqnames strand       cigar    qwidth     start       end
width      ngap
         <Rle>  <Rle> <character> <integer> <integer> <integer>
<integer> <integer>
  [1]     chr1      -         92M        92 243675625 243675716
92         0
  [2]     chr1      -         92M        92 243675625 243675716
92         0
  ---
  seqlengths:
                    chr1                  chr2 ...
chrUn_gl000226 chr18_gl000207_random
               249250621             243199373 ...
15008                  4262
>

The single read is duplicated...I'm not an expert with these libraries
so maybe this is my fault (if I remove
the second entry in features everything works, but from the
ScanBamParam man about which I was not expecting
this behaviour).
Sorry for the convoluted example but I've stumbled on this problem
while comparing counts obtained
with summarizeOverlaps in a script of mine.

Here's my session info:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] rtracklayer_1.20.2   Rsamtools_1.12.3     Biostrings_2.28.0
[4] GenomicRanges_1.12.4 IRanges_1.18.1       BiocGenerics_0.6.0

loaded via a namespace (and not attached):
[1] bitops_1.0-5    BSgenome_1.28.0 RCurl_1.95-4.1  stats4_3.0.1
[5] tools_3.0.1     XML_3.98-1.1    zlibbioc_1.6.0


Thanks,
E.G.


-- 
$ pom



More information about the Bioconductor mailing list