[Rd] sum() returns NA on a long *logical* vector when nb of TRUE values exceeds 2^31
maechler at stat.math.ethz.ch
Tue Jun 6 09:45:44 CEST 2017
>>>>> Hervé Pagès <hpages at fredhutch.org>
>>>>> on Fri, 2 Jun 2017 04:05:15 -0700 writes:
> Hi, I have a long numeric vector 'xx' and I want to use
> sum() to count the number of elements that satisfy some
> criteria like non-zero values or values lower than a
> certain threshold etc...
> The problem is: sum() returns an NA (with a warning) if
> the count is greater than 2^31. For example:
>> xx <- runif(3e9) sum(xx < 0.9)
>  NA Warning message: In sum(xx < 0.9) : integer
> overflow - use sum(as.numeric(.))
> This already takes a long time and doing
> sum(as.numeric(.)) would take even longer and require
> allocation of 24Gb of memory just to store an intermediate
> numeric vector made of 0s and 1s. Plus, having to do
> sum(as.numeric(.)) every time I need to count things is
> not convenient and is easy to forget.
> It seems that sum() on a logical vector could be modified
> to return the count as a double when it cannot be
> represented as an integer. Note that length() already
> does this so that wouldn't create a precedent. Also and
> FWIW prod() avoids the problem by always returning a
> double, whatever the type of the input is (except on a
> complex vector).
> I can provide a patch if this change sounds reasonable.
This sounds very reasonable, thank you Hervé, for the report,
and even more for a (small) patch.
> Cheers, H.
> Hervé Pagès
> Program in Computational Biology Division of Public Health
> Sciences Fred Hutchinson Cancer Research Center 1100
> Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA
> E-mail: hpages at fredhutch.org Phone: (206) 667-5791 Fax:
> (206) 667-1319
> R-devel at r-project.org mailing list
More information about the R-devel