[BioC] ShortRead error when reading BAM file

Martin Morgan mtmorgan at fhcrc.org
Mon Sep 19 16:00:43 CEST 2011


On 09/19/2011 05:11 AM, Alex Gutteridge wrote:
> Hi Martin,
>
> Just to update on this, I tried your first suggestion and lo and behold
> the smaller BAM file loaded fine. I'm now going through each chromosome
> of the larger BAM to see if I can narrow down where the issue is. The
> original BAM is ~2.7GB in size, I assume the error couldn't simply be a
> function of that?

Overall size could be a factor (how much memory do you have?) but the 
error message is saying that the source code isn't handling this 
situation (and probably others) appropriately, i.e., a bug. Thanks for 
continuing to investigate... Martin

>
> Alex Gutteridge
>
> On Fri, 16 Sep 2011 06:43:51 -0700, Martin Morgan wrote:
>> Hi Alex --
>>
>> Unfortunately, not informative; it looks like the call has completed
>> but corrupted memory in the process.
>>
>> Any chance of a small data set, e.g., using samtools to select a
>> portion of the file? Along the lines of
>>
>> samtools view -b A430001.1.bam chr1:1-1000000 > A430001.1.subset.bam
>>
>> Or is there a reference MOSAIK output file somewhere that I can access?
>>
>> Also, does specifying 'which' help, e.g.,
>>
>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>> what = ShortRead:::.readAligned_bamWhat(),
>> which=GRanges("chr1", IRanges(1, 100000)))
>>
>> And if you're being indulgent and none of the above point to the
>> problem / are doable, create a file ~/.R/Makevars with
>>
>> CFLAGS += -g O0 -Dinline=""
>>
>> (that's the letter O and the number zero) and reinstalling Rsamtools
>> (this'll compile without any C optimizations) then
>>
>> R -d valgrind -f test.R
>>
>> looking for output that says 'invalid read' or 'invalid write'.
>> Thanks for working with me on this.
>>
>> Martin
>>
>> On 09/16/2011 03:36 AM, Alex Gutteridge wrote:
>>> On Thu, 15 Sep 2011 06:19:29 -0700, Martin Morgan wrote:
>>>> On 09/15/2011 02:35 AM, Alex Gutteridge wrote:
>>>>> I'm trying to load a BAM file generated by Mosaik using ShortRead, but
>>>>> I'm getting the following error:
>>>>>
>>>>>> aln.bam =
>>>>>> readAligned("data/ALIGNMENT/A430001.1.samtools.bam",type="BAM")
>>>>> Error: Input/Output
>>>>> 'readAligned' failed to parse files
>>>>> dirPath: 'data/ALIGNMENT/A430001.1.samtools.bam'
>>>>> pattern: ''
>>>>> type: 'BAM'
>>>>> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
>>>>>> sessionInfo()
>>>>> R version 2.12.0 (2010-10-15)
>>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] ShortRead_1.8.1 Rsamtools_1.2.3 lattice_0.19-13
>>>>> [4] Biostrings_2.18.0 GenomicRanges_1.2.1 IRanges_1.8.2
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] Biobase_2.10.0 grid_2.12.0 hwriter_1.2 tools_2.12.0
>>>>>
>>>>> I ran samtools from the command-line over the original Mosiak BAM file
>>>>> and it completed fine:
>>>>>
>>>>> samtools view -b A430001.1.bam > A430001.1.samtools.bam
>>>>>
>>>>> but I get the above error on both the Mosaik original and samtools
>>>>> processed BAM file.
>>>>>
>>>>> I also tried the debug suggested here:
>>>>>
>>>>> https://stat.ethz.ch/pipermail/bioconductor/2010-October/035745.html but
>>>>>
>>>>> it segfaulted:
>>>>>
>>>>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>>>>> + what = ShortRead:::.readAligned_bamWhat())
>>>>>>
>>>>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>>>>
>>>>> *** caught segfault ***
>>>>> address (nil), cause 'unknown'
>>>>>
>>>>> Traceback:
>>>>> 1: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>>>>> 2: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param =
>>>>> param)
>>>>> 3: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>>>> 4: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>>>>
>>>>> Any suggestions to debug the file would be gratefully accepted.
>>>>
>>>> Hi Alex --
>>>>
>>>> I'd stick with
>>>>
>>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>>>> what = ShortRead:::.readAligned_bamWhat())
>>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>>>
>>>> as the starting point for debugging.
>>>>
>>>> My first suggestion is to update R to R-2-13.1, install Rsamtools,
>>>> and try again.
>>>>
>>>> The next is more complicated, but not to bad. Start R with the 'gdb'
>>>> debugger, provoke the error, and then find where the error occurred.
>>>> It'll look some thing like
>>>>
>>>> R -d gdb
>>>> gdb> run
>>>> ...
>>>> > ## now at the R prompt, do what you need to segfault
>>>> ...
>>>> gdb> where
>>>>
>>>> you'll have to type the 'run' and 'where' commands; 'where' will
>>>> generate a backtrace, and if you could forward that to me (e.g., copy
>>>> / paste) that would be great.
>>>>
>>>> Martin
>>>
>>> Hi Martin,
>>>
>>> Slightly different traceback with latest version, but essentially the
>>> same:
>>>
>>>> library(ShortRead)
>>> Loading required package: IRanges
>>>
>>> Attaching package: 'IRanges'
>>>
>>> The following object(s) are masked from 'package:base':
>>>
>>> cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
>>> pmin, pmin.int, rbind, rep.int, setdiff, table, union
>>>
>>> Loading required package: GenomicRanges
>>> Loading required package: Biostrings
>>> Loading required package: lattice
>>> Loading required package: Rsamtools
>>>> aln = readAligned("data/ALIGNMENT/A430001.1.bam",type="BAM")
>>> Error: Input/Output
>>> 'readAligned' failed to parse files
>>> dirPath: 'data/ALIGNMENT/A430001.1.bam'
>>> pattern: ''
>>> type: 'BAM'
>>> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
>>> file: data/ALIGNMENT/A430001.1.bam
>>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>>> + what = ShortRead:::.readAligned_bamWhat())
>>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>> *** caught segfault ***
>>> address (nil), cause 'unknown'
>>>
>>> Traceback:
>>> 1: .Call(func, .extptr(file), space, flag, simpleCigar, ...)
>>> 2: doTryCatch(return(expr), name, parentenv, handler)
>>> 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>> 4: tryCatchList(expr, classes, parentenv, handlers)
>>> 5: tryCatch({ .Call(func, .extptr(file), space, flag, simpleCigar,
>>> ...)}, error = function(err) { stop(conditionMessage(err), "\n file: ",
>>> path(file))})
>>> 6: .io_bam(.scan_bamfile, file, param = param, path(file), index(file),
>>> "rb", reverseComplement, tmpl)
>>> 7: scanBam(bam, param = param)
>>> 8: scanBam(bam, param = param)
>>> 9: eval(expr, envir, enclos)
>>> 10: eval(call, sys.frame(sys.parent()))
>>> 11: callGeneric(bam, ..., param = param)
>>> 12: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>> 13: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>>
>>> ###############
>>>
>>>> sessionInfo()
>>> R version 2.13.1 (2011-07-08)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>>
>>> other attached packages:
>>> [1] ShortRead_1.10.4 Rsamtools_1.4.3 lattice_0.19-30
>>> [4] Biostrings_2.20.3 GenomicRanges_1.4.8 IRanges_1.10.6
>>>
>>> loaded via a namespace (and not attached):
>>> [1] Biobase_2.12.2 grid_2.13.1 hwriter_1.3
>>>
>>> ###########
>>>
>>> Here is the result of segfaulting under gdb:
>>>
>>> #########
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> R_gc_internal (size_needed=0) at memory.c:1427
>>> 1427 SEXP next = NEXT_NODE(s);
>>> (gdb) where
>>> #0 R_gc_internal (size_needed=0) at memory.c:1427
>>> #1 0x000000000041e7f3 in Rf_cons (car=0x1bf38af0, cdr=0x163ef338)
>>> at memory.c:2083
>>> #2 0x0000000000555780 in Rf_evalList (el=0x1be8f6e0, rho=0x1c31a7a0,
>>> call=0x1be8f6a8, n=1) at eval.c:1836
>>> #3 0x0000000000554a17 in Rf_eval (e=0x1be8f6a8, rho=0x1c31a7a0) at
>>> eval.c:501
>>> #4 0x0000000000555580 in Rf_evalListKeepMissing (el=0x1c31a2f0,
>>> rho=0x1c31a7a0) at eval.c:1900
>>> #5 0x0000000000555ba5 in Rf_DispatchOrEval (call=0x1c31a360,
>>> op=0x1640f938,
>>> generic=0x616594 "[<-", args=0x1c31a328, rho=0x1c31a7a0,
>>> ans=0x7fff256b5858, dropmissing=0, argsevald=0) at eval.c:2381
>>> #6 0x000000000048f300 in do_subassign (call=0x0, op=0x8d1d78,
>>> args=0x5443474741544747, rho=0x1c31a7a0) at subassign.c:1313
>>> #7 0x0000000000554981 in Rf_eval (e=0x1c31a360, rho=0x1c31a7a0) at
>>> eval.c:482
>>> #8 0x0000000000557324 in do_set (call=0x1c31a408, op=0x1640e788,
>>> args=0x1c31a3d0, rho=0x1c31a7a0) at eval.c:1722
>>> #9 0x0000000000554981 in Rf_eval (e=0x1c31a408, rho=0x1c31a7a0) at
>>> eval.c:482
>>> #10 0x0000000000556c84 in applydefine (call=<value optimized out>,
>>> op=0x1640e788, args=<value optimized out>, rho=0x1c31a7a0) at
>>> eval.c:1678
>>> #11 0x0000000000554981 in Rf_eval (e=0x1be8f590, rho=0x1c31a7a0) at
>>> eval.c:482
>>> #12 0x00000000005560ee in do_begin (call=0x1be8f360, op=0x1640e590,
>>> args=0x5443474741544747, rho=0x1c31a7a0) at eval.c:1420
>>> #13 0x0000000000554981 in Rf_eval (e=0x1be8f360, rho=0x1c31a7a0) at
>>> eval.c:482
>>>
>


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioconductor mailing list