[BioC] Rsamtools hangs reading SOLiD bam files

Martin Morgan mtmorgan at fhcrc.org
Tue Oct 12 18:57:33 CEST 2010


On 10/12/2010 12:27 AM, Asta Laiho wrote:
> Hi,
> 
> I'm trying to work with *.bam and *.bai files produced using Bioscope (SOLiD related software package, v.1.2.1). I tried two examples in the Rsamtools manual (the one on top of the page 2 for querying the reads in the given range, and the one on the bottom of the page 4 for calculating coverages for chunks of the file). I tried with files of different sizes (35Mb, 1.8Gb) but the code in both examples just kept running without any error messages and without producing results in any reasonable time. I even left it running over night but it still hadn't finished. My computer has Mac OS X 10.6.4 with 8Gb memory. The session info is attached below. Are there any known issues with Rsamtools and bam/bai files originating from SOLiD Bioscope software?
> 
> Many thanks for all advice in advance,

Hi Asta --

I don't know of outstanding issues. If the query is expected to retrieve
a 'small' number of reads (millions, say) then it should be fast (as in
not enough time to check your email). If it's returning large numbers of
reads then memory might become a problem.

If there is a 'bug' my guess would be that it involved integer overflow
in the index -- seeking a read that is late in a very large BAM file.

So...

verify basic functionality with

  library(Rsamtools); example(scanBam)

try accessing a few reads at the beginning of the first reference
sequence returned by

  scanBamHeader(fl)[[1]][["targets"]]

where 'fl' is the name of your BAM file.

If this doesn't provide any hint then please include a minimal script
sufficient to reproduce your problem. It would be very helpful to point
to a publicly available BAM file generated by the same tools as you are
using.

Martin


> Asta
> 
> sessionInfo()
> R version 2.11.1 (2010-05-31) 
> x86_64-apple-darwin9.8.0 
> 
> locale:
> [1] C/UTF-8/C/C/C/C
> 
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> 
> other attached packages:
> [1] Rsamtools_1.0.8     Biostrings_2.16.9   GenomicRanges_1.0.7
> [4] IRanges_1.6.11     
> 
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list