[Rd] There is pmin and pmax each taking na.rm, how about psum?

Justin Talbot jtalbot at stanford.edu
Wed Oct 31 16:38:33 CET 2012


> Because that's inconsistent with pmin and pmax when two NAs are summed.
>
> x = c(1,3,NA,NA,5)
> y = c(2,NA,4,NA,1)
> colSums(rbind(x, y), na.rm = TRUE)
> [1] 3 3 4 0 6    # actual
> [1] 3 3 4 NA 6   # desired

But your desired result would be inconsistent with sum:
sum(NA,NA,na.rm=TRUE)
[1] 0

>From a language definition perspective I think having psum return 0
here is right choice. R consistently distinguishes between operators
that have a sensible identity (+:0, *:1, &:TRUE, |:FALSE) which return
the identity if removing NAs results in no items, and those that kind
of don't (pmin, pmax) which return NA. Let's not break that.

(I would argue that pmin and pmax should return their actual
identities too: Inf and -Inf respectively, but I can understand the
current behavior.)


My 2 cents on psum:

R has a natural set of associative & commutative operators: +, *, &,
|, pmin, pmax.

These correspond directly to the reduction functions: sum, prod, all,
any, min, max

The current problem is that pmin and pmax are more powerful than +, *,
&, and |. The right fix is to extend the rest of the associative &
commutative operators to have the same power as pmin and pmax.

Thus, + should have the signature: `+`(..., na.rm=FALSE), which would
allow you to do things like:

`+`(c(1,2),c(1,2),c(1,2),NA, na.rm=TRUE) = c(3,6)

If you don't like typing `+`, you could always alias psum to `+`.

Additionally, R currently has two simple reduction functions that
don't have corresponding operators: range and length. Having a prange
operator and a plength operator would nicely round out the language.

Justin



More information about the R-devel mailing list