[Rd] There is pmin and pmax each taking na.rm, how about psum?

Thu Nov 1 16:48:54 CET 2012

Justin Talbot <jtalbot <at> stanford.edu> writes:
>
> > Because that's inconsistent with pmin and pmax when two NAs are summed.
> >
> > x = c(1,3,NA,NA,5)
> > y = c(2,NA,4,NA,1)
> > colSums(rbind(x, y), na.rm = TRUE)
> > [1] 3 3 4 0 6    # actual
> > [1] 3 3 4 NA 6   # desired
>
> But your desired result would be inconsistent with sum:
> sum(NA,NA,na.rm=TRUE)
> [1] 0
>
> >From a language definition perspective I think having psum return 0
> here is right choice.

Ok, you've sold me. psum(NA,NA,na.rm=TRUE) returning 0 sounds good. And
pprod(NA,NA,na.rm=TRUE) returning 1, consistent with prod then.

Then the case for psum is more for convenience and speed -vs-
colSums(rbind(x,y), na.rm=TRUE)), since rbind will copy x and y into a new
matrix. The case for pprod is similar, plus colProds doesn't exist.

> Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
> allow you to do things like:
>
> `+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)
>
> If you don't like typing `+`, you could always alias psum to `+`.

But there would be a cost, wouldn't there? `+` is a dyadic .Primitive.
Changing that to take `...` and `na.rm` could slow it down (iiuc), and any
changes to the existing language are risky.  For example :
    `+`(1,2,3)
is currently an error. Changing that to do something might have
implications for some of the 4,000 packages (some might rely on that being
an error), with a possible speed cost too.

In contrast, adding two functions that didn't exist before: psum and pprod,
seems to be a safer and simpler proposition.

Matthew