[Rd] bug in sum() on integer vector

Wed Dec 14 22:17:29 CET 2011

On 14.12.2011 22:16, John C Nash wrote:
> I agree that where the overflow occurs is not critical (one can go back to cumsum and find
> out). I am assuming that Uwe still wants to know there has been an overflow at some point
> i.e., a warning.

Yes, sure.

Uwe

> This could become more "interesting" as parallel computation causes
> different summation orderings on sums of large numbers of items.
>
> JN
>
>
> On 12/14/2011 03:58 PM, Uwe Ligges wrote:
>>
>>
>> On 14.12.2011 17:19, peter dalgaard wrote:
>>>
>>> On Dec 14, 2011, at 16:19 , John C Nash wrote:
>>>
>>>>
>>>> Following this thread, I wondered why nobody tried cumsum to see where the integer
>>>> overflow occurs. On the shorter xx vector in the little script below I get a message:
>>>>
>>>> Warning message:
>>>> Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
>>>>>
>>>>
>>>> But sum() does not give such a warning, which I believe is the point of contention. Since
>>>> cumsum() does manage to give such a warning, and show where the overflow occurs, should
>>>> sum() not be able to do so? For the record, I don't class the non-zero answer as an error
>>>> in itself. I regard the failure to warn as the issue.
>>>
>>> It (sum) does warn if you take the two "halves" separately. The issue is that the
>>> overflow is detected at the end of the summation, when the result is to be saved to an
>>> integer (which of course happens for all intermediate sums in cumsum)
>>>
>>>> x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
>>>> sum(x[1:10000000])
>>> [1] NA
>>> Warning message:
>>> In sum(x[1:1e+07]) : Integer overflow - use sum(as.numeric(.))
>>>> sum(x[10000001:25000000])
>>> [1] NA
>>> Warning message:
>>> In sum(x[10000001:1.5e+07]) : Integer overflow - use sum(as.numeric(.))
>>>> sum(x)
>>> [1] 4996000
>>>
>>> There's a pretty easy fix, essentially to move
>>>
>>>       if(s>   INT_MAX || s<   R_INT_MIN){
>>>           warningcall(call, _("Integer overflow - use sum(as.numeric(.))"));
>>>           *value = NA_INTEGER;
>>>       }
>>>
>>> inside the summation loop. Obviously, there's a speed penalty from two FP comparisons
>>> per element, but I wouldn't know whether it matters in practice for anyone.
>>>
>>
>>
>> I don't think I am interested in where the overflow happens if I call sum()...
>>
>> Uwe