[BioC] ShortRead error when reading BAM file

Alex Gutteridge alexg at ruggedtextile.com
Mon Sep 19 14:11:59 CEST 2011


 Hi Martin,

 Just to update on this, I tried your first suggestion and lo and behold 
 the smaller BAM file loaded fine. I'm now going through each chromosome 
 of the larger BAM to see if I can narrow down where the issue is. The 
 original BAM is ~2.7GB in size, I assume the error couldn't simply be a 
 function of that?

 Alex Gutteridge

 On Fri, 16 Sep 2011 06:43:51 -0700, Martin Morgan wrote:
> Hi Alex --
>
> Unfortunately, not informative; it looks like the call has completed
> but corrupted memory in the process.
>
> Any chance of a small data set, e.g., using samtools to select a
> portion of the file? Along the lines of
>
>   samtools view -b A430001.1.bam chr1:1-1000000 > 
> A430001.1.subset.bam
>
> Or is there a reference MOSAIK output file somewhere that I can 
> access?
>
> Also, does specifying 'which' help, e.g.,
>
> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>   what = ShortRead:::.readAligned_bamWhat(),
>   which=GRanges("chr1", IRanges(1, 100000)))
>
> And if you're being indulgent and none of the above point to the
> problem / are doable, create a file ~/.R/Makevars with
>
>   CFLAGS += -g O0 -Dinline=""
>
> (that's the letter O and the number zero) and reinstalling Rsamtools
> (this'll compile without any C optimizations) then
>
>   R -d valgrind -f test.R
>
> looking for output that says 'invalid read' or 'invalid write'.
> Thanks for working with me on this.
>
> Martin
>
> On 09/16/2011 03:36 AM, Alex Gutteridge wrote:
>> On Thu, 15 Sep 2011 06:19:29 -0700, Martin Morgan wrote:
>>> On 09/15/2011 02:35 AM, Alex Gutteridge wrote:
>>>> I'm trying to load a BAM file generated by Mosaik using ShortRead, 
>>>> but
>>>> I'm getting the following error:
>>>>
>>>>> aln.bam =
>>>>> readAligned("data/ALIGNMENT/A430001.1.samtools.bam",type="BAM")
>>>> Error: Input/Output
>>>> 'readAligned' failed to parse files
>>>> dirPath: 'data/ALIGNMENT/A430001.1.samtools.bam'
>>>> pattern: ''
>>>> type: 'BAM'
>>>> error: INTEGER() can only be applied to a 'integer', not a 
>>>> 'symbol'
>>>>> sessionInfo()
>>>> R version 2.12.0 (2010-10-15)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] stats graphics grDevices utils datasets methods base
>>>>
>>>> other attached packages:
>>>> [1] ShortRead_1.8.1 Rsamtools_1.2.3 lattice_0.19-13
>>>> [4] Biostrings_2.18.0 GenomicRanges_1.2.1 IRanges_1.8.2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] Biobase_2.10.0 grid_2.12.0 hwriter_1.2 tools_2.12.0
>>>>
>>>> I ran samtools from the command-line over the original Mosiak BAM 
>>>> file
>>>> and it completed fine:
>>>>
>>>> samtools view -b A430001.1.bam > A430001.1.samtools.bam
>>>>
>>>> but I get the above error on both the Mosaik original and samtools
>>>> processed BAM file.
>>>>
>>>> I also tried the debug suggested here:
>>>> 
>>>> https://stat.ethz.ch/pipermail/bioconductor/2010-October/035745.html 
>>>> but
>>>> it segfaulted:
>>>>
>>>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = 
>>>>> TRUE,
>>>> + what = ShortRead:::.readAligned_bamWhat())
>>>>>
>>>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', 
>>>>> param=param)
>>>>
>>>> *** caught segfault ***
>>>> address (nil), cause 'unknown'
>>>>
>>>> Traceback:
>>>> 1: .Call(func, file, index, "rb", NULL, flag, simpleCigar, ...)
>>>> 2: .io_bam(.scan_bam, file, index, reverseComplement, tmpl, param 
>>>> =
>>>> param)
>>>> 3: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>>> 4: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>>>
>>>> Any suggestions to debug the file would be gratefully accepted.
>>>
>>> Hi Alex --
>>>
>>> I'd stick with
>>>
>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>>> what = ShortRead:::.readAligned_bamWhat())
>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>>>
>>> as the starting point for debugging.
>>>
>>> My first suggestion is to update R to R-2-13.1, install Rsamtools,
>>> and try again.
>>>
>>> The next is more complicated, but not to bad. Start R with the 
>>> 'gdb'
>>> debugger, provoke the error, and then find where the error 
>>> occurred.
>>> It'll look some thing like
>>>
>>> R -d gdb
>>> gdb> run
>>> ...
>>> > ## now at the R prompt, do what you need to segfault
>>> ...
>>> gdb> where
>>>
>>> you'll have to type the 'run' and 'where' commands; 'where' will
>>> generate a backtrace, and if you could forward that to me (e.g., 
>>> copy
>>> / paste) that would be great.
>>>
>>> Martin
>>
>> Hi Martin,
>>
>> Slightly different traceback with latest version, but essentially 
>> the same:
>>
>>> library(ShortRead)
>> Loading required package: IRanges
>>
>> Attaching package: 'IRanges'
>>
>> The following object(s) are masked from 'package:base':
>>
>> cbind, eval, intersect, Map, mapply, order, paste, pmax, pmax.int,
>> pmin, pmin.int, rbind, rep.int, setdiff, table, union
>>
>> Loading required package: GenomicRanges
>> Loading required package: Biostrings
>> Loading required package: lattice
>> Loading required package: Rsamtools
>>> aln = readAligned("data/ALIGNMENT/A430001.1.bam",type="BAM")
>> Error: Input/Output
>> 'readAligned' failed to parse files
>> dirPath: 'data/ALIGNMENT/A430001.1.bam'
>> pattern: ''
>> type: 'BAM'
>> error: INTEGER() can only be applied to a 'integer', not a 'symbol'
>> file: data/ALIGNMENT/A430001.1.bam
>>> param = ScanBamParam(simpleCigar = TRUE, reverseComplement = TRUE,
>> + what = ShortRead:::.readAligned_bamWhat())
>>> res = scanBam('data/ALIGNMENT/A430001.1.samtools.bam', param=param)
>> *** caught segfault ***
>> address (nil), cause 'unknown'
>>
>> Traceback:
>> 1: .Call(func, .extptr(file), space, flag, simpleCigar, ...)
>> 2: doTryCatch(return(expr), name, parentenv, handler)
>> 3: tryCatchOne(expr, names, parentenv, handlers[[1L]])
>> 4: tryCatchList(expr, classes, parentenv, handlers)
>> 5: tryCatch({ .Call(func, .extptr(file), space, flag, simpleCigar,
>> ...)}, error = function(err) { stop(conditionMessage(err), "\n file: 
>> ",
>> path(file))})
>> 6: .io_bam(.scan_bamfile, file, param = param, path(file), 
>> index(file),
>> "rb", reverseComplement, tmpl)
>> 7: scanBam(bam, param = param)
>> 8: scanBam(bam, param = param)
>> 9: eval(expr, envir, enclos)
>> 10: eval(call, sys.frame(sys.parent()))
>> 11: callGeneric(bam, ..., param = param)
>> 12: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>> 13: scanBam("data/ALIGNMENT/A430001.1.samtools.bam", param = param)
>>
>> ###############
>>
>>> sessionInfo()
>> R version 2.13.1 (2011-07-08)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] ShortRead_1.10.4 Rsamtools_1.4.3 lattice_0.19-30
>> [4] Biostrings_2.20.3 GenomicRanges_1.4.8 IRanges_1.10.6
>>
>> loaded via a namespace (and not attached):
>> [1] Biobase_2.12.2 grid_2.13.1 hwriter_1.3
>>
>> ###########
>>
>> Here is the result of segfaulting under gdb:
>>
>> #########
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> R_gc_internal (size_needed=0) at memory.c:1427
>> 1427 SEXP next = NEXT_NODE(s);
>> (gdb) where
>> #0 R_gc_internal (size_needed=0) at memory.c:1427
>> #1 0x000000000041e7f3 in Rf_cons (car=0x1bf38af0, cdr=0x163ef338)
>> at memory.c:2083
>> #2 0x0000000000555780 in Rf_evalList (el=0x1be8f6e0, rho=0x1c31a7a0,
>> call=0x1be8f6a8, n=1) at eval.c:1836
>> #3 0x0000000000554a17 in Rf_eval (e=0x1be8f6a8, rho=0x1c31a7a0) at
>> eval.c:501
>> #4 0x0000000000555580 in Rf_evalListKeepMissing (el=0x1c31a2f0,
>> rho=0x1c31a7a0) at eval.c:1900
>> #5 0x0000000000555ba5 in Rf_DispatchOrEval (call=0x1c31a360, 
>> op=0x1640f938,
>> generic=0x616594 "[<-", args=0x1c31a328, rho=0x1c31a7a0,
>> ans=0x7fff256b5858, dropmissing=0, argsevald=0) at eval.c:2381
>> #6 0x000000000048f300 in do_subassign (call=0x0, op=0x8d1d78,
>> args=0x5443474741544747, rho=0x1c31a7a0) at subassign.c:1313
>> #7 0x0000000000554981 in Rf_eval (e=0x1c31a360, rho=0x1c31a7a0) at
>> eval.c:482
>> #8 0x0000000000557324 in do_set (call=0x1c31a408, op=0x1640e788,
>> args=0x1c31a3d0, rho=0x1c31a7a0) at eval.c:1722
>> #9 0x0000000000554981 in Rf_eval (e=0x1c31a408, rho=0x1c31a7a0) at
>> eval.c:482
>> #10 0x0000000000556c84 in applydefine (call=<value optimized out>,
>> op=0x1640e788, args=<value optimized out>, rho=0x1c31a7a0) at 
>> eval.c:1678
>> #11 0x0000000000554981 in Rf_eval (e=0x1be8f590, rho=0x1c31a7a0) at
>> eval.c:482
>> #12 0x00000000005560ee in do_begin (call=0x1be8f360, op=0x1640e590,
>> args=0x5443474741544747, rho=0x1c31a7a0) at eval.c:1420
>> #13 0x0000000000554981 in Rf_eval (e=0x1be8f360, rho=0x1c31a7a0) at
>> eval.c:482
>>

-- 
 Alex Gutteridge



More information about the Bioconductor mailing list