[Rd] sum() vs cumsum() implicit type coercion

Hugh Parsonage hugh@p@r@on@ge @end|ng |rom gm@||@com
Tue Aug 25 13:33:05 CEST 2020


(If I may be so bold, although I think it's unlikely that a majority
would be in favour of this change, and I doubt anyone is actually
proposing it, I think quite a bit more than "a majority" should be
required before a change like this should be allowed.

Considering the feature that cumsum coerces to numeric is documented,
the consistency of type coercion between sum and cumsum has never been
advertised, and that a custom version of cumsum that addresses the
inconsistency would be very easy for users to create themselves, I'd
struggle to think the change could ever have merit. Even public
unanimity would probably not be enough.)

On Tue, 25 Aug 2020 at 20:25, Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>
> >>>>> Tomas Kalibera
> >>>>>     on Tue, 25 Aug 2020 09:29:05 +0200 writes:
>
>     > On 8/23/20 5:02 PM, Rory Winston wrote:
>     >> Hi
>     >>
>     >> I noticed a small inconsistency when using sum() vs cumsum()
>     >>
>     >> I have a char-based series
>     >>
>     >> > tryjpy$long
>     >>
>     >> [1] "0.0022"  "-0.0002" "-0.0149" "-0.0023" "-0.0342" "-0.0245" "-0.0022"
>     >>
>     >> [8] "0.0003"  "-0.0001" "-0.0004" "-0.0036" "-0.001"  "-0.0011" "-0.0012"
>     >>
>     >> [15] "-0.0006" "0.0016"  "0.0006"
>     >>
>     >> When I run sum() vs cumsum() , sum fails but cumsum converts the
>     >> series to numeric before summing:
>     >>
>     >>> sum(tryjpy$long)
>     >> Error in sum(tryjpy$long) : invalid 'type' (character) of argument
>     >>
>     >>> cumsum(tryjpy$long)
>     >> [1]  0.0022  0.0020 -0.0129 -0.0152 -0.0494 -0.0739 -0.0761 -0.0758 -0.0759
>     >> [10] -0.0763 -0.0799 -0.0809 -0.0820 -0.0832 -0.0838 -0.0822 -0.0816
>     >>
>     >> Which I guess is due to the following line in do_cum():
>     >>
>     >> PROTECT(t = coerceVector(CAR(args), REALSXP));
>     >> This might be fine and there may be very good reasons why there is no
>     >> coercion in sum - just seems a little inconsistent in usage
>
>     > Yes. I don't know the reason for this design, but please note it is
>     > documented in ?sum and in ?cumsum, which would also make it harder to
>     > change. One can always use a consistent subset (not rely on the coercion
>     > e.g. from characters).
>
>     > Best
>     > Tomas
>
> Indeed.
> Further note that most arithmetic/math  *fails* on
> character vectors, so if a change would have to be made, it
> should rather be such that cumsum() also rejects character
> input.
>
> We would have consistency then, but potentially break user code,
> even package code which has hitherto assumed cumsum() to coerce
> to numeric first.
>
> If a majority of commentators and R core thinks we should make
> such a change, I'd agree to consider it.
>
> Otherwise, we save (ourselves and others) a bit of time.
> Martin
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list