[R] Bug or feature? sum(c(a, b, c)) != (a + b + c)

Thomas Lumley tlumley at uw.edu
Wed Aug 24 01:48:15 CEST 2011


On Wed, Aug 24, 2011 at 11:06 AM, Ted Harding <ted.harding at wlandres.net> wrote:
> On 23-Aug-11 22:20:48, Barry Rowlingson wrote:
>> On Tue, Aug 23, 2011 at 8:17 PM, Daniel Lai <danlai at bccrc.ca> wrote:
>>> Greetings all,
>>>
>>> I'm porting an algorithm from MATLAB to R, and noticed some minor
>>> discrepancies in small decimal values using rowSums and colSums which
>>> are
>>> exacerbated after heavy iteration and log space transformation. This
>>> was
>>> rather perplexing as both programs claimed and appeared to use the
>>> IEEE 754
>>> standard for floating point arithmetic (confirmed with manual basic
>>> operations). _After some tracing and testing, I've managed to isolated
>>> a
>>> minimal working example as follows:
>>>
>>> a = 0.812672
>>> b = 0.916541
>>> c = 0.797810
>>> sum(c(a, b, c)) == (a + b + c)
>>> [1] FALSE
>>
>>  Its probably to do with the order of summations. With your a,b,c you
>> get:
>>
>>  > (a+b+c) == (c+b+a)
>>  [1] TRUE
>>  > (a+b+c) == (c+a+b)
>>  [1] FALSE
>>
>> shock horror, addition is not associative[1]. Lets investigate:
>>
>>  > sum(c(a,b,c)) == c+a+b
>>  [1] TRUE
>>  > sum(c(a,b,c)) == a+c+b
>>  [1] TRUE
>>
>>  'sum' seems to get the same answer as adding the first and the third,
>> then adding the second - explicitly:
>>
>>  > sum(c(a,b,c)) == (a+c)+b
>>  [1] TRUE
>>
>> I'm not sure what it would do for four values in the sum. Have fun
>> finding out. Does matlab similarly have a+b+c != c+b+a?
>>
>> Barry
>>
>> [1] or commutative or distributive or one of those -ives you learn one
>> day in school. Too lazy to wikipedia it right now...
>
>   'sum' seems to get the same answer as adding the first and the third,
>   then adding the second
>
> It seems to be a bit more subtle than that!
>
>  sum(c(a,b,c))==sum(c(a,b,c))
>  # [1] TRUE
>  sum(c(a,b,c))==sum(c(a,c,b))
>  # [1] TRUE
>  sum(c(a,b,c))==sum(c(b,a,c))
>  # [1] TRUE
>  sum(c(a,b,c))==sum(c(b,c,a))
>  # [1] TRUE
>  sum(c(a,b,c))==sum(c(c,a,b))
>  # [1] TRUE
>  sum(c(a,b,c))==sum(c(c,b,a))
>  # [1] TRUE
>
> so that's TRUE for all 6 permutations. However (as before,
> but all 6 this time):
>
>  a+b+c == a+b+c
>  # [1] TRUE
>  a+b+c == a+c+b
>  # [1] FALSE
>  a+b+c == b+c+a
>  # [1] TRUE
>  a+b+c == b+a+c
>  # [1] TRUE
>  a+b+c == c+a+b
>  # [1] FALSE
>  a+b+c == c+b+a
>  # [1] TRUE
>
> So it looks as though sum(c(...)) is somwhow independent of
> the order of its arguments, which implies that the summation
> it does is not quite the addition corresponding to "+".
>
> I now feel out of my depth, so hope someone else knows/can
> find the truth about this!

sum() uses a long double accumulator.

>From src/main/summary.c

static Rboolean rsum(double *x, int n, double *value, Rboolean narm)
{
    long double s = 0.0;
    int i;
    Rboolean updated = FALSE;

    for (i = 0; i < n; i++) {
	if (!narm || !ISNAN(x[i])) {
	    if(!updated) updated = TRUE;
	    s += x[i];
	}
    }
    *value = s;

    return(updated);
}


-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland



More information about the R-help mailing list