[BioC] Rsamtools: Realloc integer overflow?

Martin Morgan mtmorgan at fhcrc.org
Tue Jun 4 04:48:09 CEST 2013


On 06/03/2013 07:33 PM, Hervé Pagès wrote:
> Hi Martin,
>
> On 06/03/2013 06:26 PM, Martin Morgan wrote:
>> On 06/03/2013 05:27 PM, Michael Lawrence wrote:
>>> Hey guys,
>>>
>>> Whenever I try to calculate the coverage for a BAM file with more than
>>> say
>>> 500 million reads, I get this error:
>>>
>>> Error in coverage(readBamGappedAlignments(x, param = param), shift =
>>> shift,  : \n  error in evaluating the argument 'x' in selecting a method
>>> for function 'coverage': Error in value[[3L]](cond) (from #2) : \n
>>> 'Realloc' could not re-allocate memory (18446744065128005632 bytes)\n
>>>
>>> This looks like integer overflow, possibly within _grow_SCAN_BAM_DATA().
>>> Could we just use long there?
>>
>> I wonder if it would be more sensible if less convenient to do this
>> (under Bioc-devel)
>>
>>    bf <- open(BamFile(fl, yieldSize=100000000))
>>    cvg <- coverage(readGAlignmentsFromBam(bf))
>>    while (length(aln <- readGAlignmentsFromBam(bf)))
>>        cvg <- cvg + coverage(aln)
>>    close(bf)
>>
>> ? It opens the door for better memory management and parallel evaluation.
>>
>> I'm concerned that using size_t (Realloc casts to this) or ptrdiff_t
>> (the size of R long vectors) would only get us through the C code; the
>> representation of this in R would require R long vectors, and Rsamtools
>> does not (yet?) support that.
>
> Sorry if I'm missing something obvious but why would the representation
> of 500 million reads (either as a GappedAlignments object or as a plain
> list as returned by scanBam()) require R long vectors?

not that 500 million would, but that going for 'more' will eventually (when 
Michael gets 5 times more ambitious than he is now).

At the least the software should go up to the limit of R vectors gracefully; as 
you point out it shouldn't be having problems with 500 million reads.

Martin

>
> Thanks,
> H.
>
>
>>
>> Martin
>>
>>>
>>> Michael
>>>
>>>     [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives:
>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>>
>>
>>
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Bioconductor mailing list