[BioC] reading BAM files

Martin Morgan mtmorgan at fhcrc.org
Thu Nov 29 00:35:52 CET 2012


On 11/27/2012 11:35 AM, Marcus Davy wrote:
> Hi Hermann,
> to clarify further, you can only index a sorted bam file, in Rsamtools there are
> two alternatives;
>
> 1. use asBam() which sorts and creates an index file in one go
> 2. use sortBam() and indexBam() to sort a bam file and then create an index
>
> Or run samtools from the bash prompt/command line to achieve the same result.
>
> samtools view -bS -o aln.bam aln.sam
> samtools sort aln.bam aln_sorted
> samtools index aln_sorted.bam
>
> Note, I can get a *segfault *error if sortBam() is run on a 'sam' file by
> accident, the error and sessionInfo() is at the bottom of this email (this may
> already be fixed in the development version).

In Rsamtools devel version 1.11.12, sortBam and indexBam no longer segfault when 
run on non-BAM files.

Thanks for the bug report.

Martin

>
> Some pseudocode;
>
> ## unsorted bam file
> samName     <- "aln.sam"
> bamName     <- "aln.bam"
> sbamSuffix  <- "aln_sorted" ## No suffix
> sbamName    <- "aln_sorted.bam"
>
> ## 1) Sort and Index
> ## Sort into destination file 'aln_sorted.bam' using sortBam()
> sortBam(bamName, sbamSuffix)
> dir(pattern="bam")
>
> ## Index sorted bam file using indexBam()
> indexBam(sbamName)
> dir(pattern="bam")
>
> ## 2) asBam() sorted and creates index
> unlink("*bam*")
> asBam(samName, sbamSuffix)
> dir(pattern="bam")
>
> ## Working with a sorted bam file
> what  <- scanBamWhat()                             ## subset as required
> ft    <- scanBamHeader(sbamName)[[1]][["targets"]]
> which <- GRanges(names(ft), IRanges(1, ft))        ## subset as required
> param <- ScanBamParam(which=which, what=what)
> bam   <- scanBam(sbamName, param = param)
>
>
> ## Note if you try to sortBam a 'sam' file Rsamtools appears to segfault
> sessionInfo()
> sortBam(samName, sbamSuffix)
>
> Marcus
>
>> sessionInfo()
> R version 2.15.0 (2012-03-30)
> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] BiocInstaller_1.8.3  Rsamtools_1.10.2     Biostrings_2.26.2
> [4] GenomicRanges_1.10.2 IRanges_1.16.2       BiocGenerics_0.4.0
>
> loaded via a namespace (and not attached):
> [1] bitops_1.0-4.1  parallel_2.15.0 stats4_2.15.0   tools_2.15.0
> [5] zlibbioc_1.4.0
>> sortBam(samName, sbamSuffix)
>   *** caught segfault ***
> address 0x30, cause 'memory not mapped'
>
> Traceback:
>   1: .Call(.sort_bam, file, destination, byQname, as.integer(maxMemory))
>   2: .local(file, destination, ...)
>   3: sortBam(samName, sbamSuffix)
>   4: sortBam(samName, sbamSuffix)
>
> Possible actions:
> 1: abort (with core dump, if enabled)
> 2: normal R exit
> 3: exit R without saving workspace
> 4: exit R saving workspace
>
> On Tue, Nov 27, 2012 at 6:32 AM, James W. MacDonald <jmacdon at uw.edu
> <mailto:jmacdon at uw.edu>> wrote:
>
>     Hi Hermann,
>
>
>     On 11/26/2012 12:26 PM, Hermann Norpois wrote:
>
>         Hello,
>
>         I try to read BAM files - so far without success. And I dont know why.
>
>         My Bamfile is wgEncodeOpenChromDnase8988tAln__Rep1.bam
>         (downloaded from the Encode site)
>
>         For example:
>
>         library (ShortRead)
>
>             which<- RangesList(seq1=IRanges(1000,__2000),
>
>         + seq2=IRanges(c(100,1000), c(1000,2000)))
>
>             what<- c("rname", "strand", "pos", "qwidth","seq")
>             param<- ScanBamParam(which=which, what=what)
>             bam<- scanBam ("~/Bam/__wgEncodeOpenChromDnase8988tAln__Rep1.bam",
>
>         param=param)
>         [bam_index_load] fail to load BAM index.
>         Fehler in open.BamFile(BamFile(file, index), "rb") :
>             failed to load BAM index
>             file: /home/hnorpois/Bam/__wgEncodeOpenChromDnase8988tAln__Rep1.bam
>
>
>         The mentioned syntax I copied from an introduction to Rsamtools.
>         Actually I would like to know a simple syntax that enables me to read
>         BAM-files. Can anybody give me a hint.
>
>
>     The hint is in the error message above - 'failed to load BAM index', which
>     indicates that you have not yet indexed your bam file. You need to first use
>     indexBam(), then you should be able to read in.
>
>     Best,
>
>     Jim
>
>
>
>
>         Thanks Hermann
>
>                  [[alternative HTML version deleted]]
>
>         _________________________________________________
>         Bioconductor mailing list
>         Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>         https://stat.ethz.ch/mailman/__listinfo/bioconductor
>         <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>         Search the archives:
>         http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>         <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>
>     --
>     James W. MacDonald, M.S.
>     Biostatistician
>     University of Washington
>     Environmental and Occupational Health Sciences
>     4225 Roosevelt Way NE, # 100
>     Seattle WA 98105-6099
>
>
>     _________________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/__listinfo/bioconductor
>     <https://stat.ethz.ch/mailman/listinfo/bioconductor>
>     Search the archives:
>     http://news.gmane.org/gmane.__science.biology.informatics.__conductor
>     <http://news.gmane.org/gmane.science.biology.informatics.conductor>
>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list