[Rd] bug in sum() on integer vector

Hervé Pagès hpages at fhcrc.org
Fri Dec 9 22:41:59 CET 2011


Hi Duncan,

On 11-12-09 11:39 AM, Duncan Murdoch wrote:
> On 09/12/2011 1:40 PM, Hervé Pagès wrote:
>> Hi,
>>
>> x<- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
>>
>> This is correct:
>>
>> > sum(as.double(x))
>> [1] 0
>>
>> This is not:
>>
>> > sum(x)
>> [1] 4996000
>>
>> Returning NA (with a warning) would also be acceptable for the latter.
>> That would make it consistent with cumsum(x):
>>
>> > cumsum(x)[length(x)]
>> [1] NA
>> Warning message:
>> Integer overflow in 'cumsum'; use 'cumsum(as.numeric(.))'
>
> This is a 64 bit problem; in 32 bits things work out properly.
> I'd guess
> in 64 bit arithmetic we or the run-time are doing something to simulate
> 32 bit arithmetic (since integers are 32 bits), but it looks as though
> we're not quite getting it right.

It doesn't work properly for me on Leopard (32-bit mode):

   > x <- c(rep(1800000003L, 10000000), -rep(1200000002L, 15000000))
   > sum(as.double(x))
   [1] 0
   > sum(x)
   [1] 4996000
   > sessionInfo()
   R version 2.14.0 RC (2011-10-27 r57452)
   Platform: i386-apple-darwin9.8.0/i386 (32-bit)

   locale:
   [1] C

   attached base packages:
   [1] stats     graphics  grDevices utils     datasets  methods   base

It looks like the problem is that isum() (in src/main/summary.c)
uses a 'double' internally to do the sum, whereas rsum() and csum()
use a 'long double'.

Note that isum() seems to be assuming that NA_INTEGER and NA_LOGICAL
will always be the same (probably fine) and that TRUE values in the
input vector are always represented as a 1 (not so sure about this one).

A more fundamental question: is switching back and forth between
'int' and 'double' (or 'long double') the right thing to do for doing
"safe" arithmetic on integers?

Thanks!
H.


>
> Duncan Murdoch
>
>> Thanks!
>> H.
>>
>> > sessionInfo()
>> R version 2.14.0 (2011-10-31)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
>> [3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
>> [5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
>> [7] LC_PAPER=C LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats graphics grDevices utils datasets methods base
>>
>


-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list