[Rd] 1.4.0: mean/sum of logicals

Prof Brian D Ripley ripley@stats.ox.ac.uk
Sat, 6 Oct 2001 09:32:50 +0100 (BST)


On 5 Oct 2001, Peter Dalgaard BSA wrote:

> Torsten Hothorn <Torsten.Hothorn@rzmail.uni-erlangen.de> writes:
>
> > the NEWS file in  1.4.0-devel states:
> >
> > o   mean() has `data frame' method applying mean column-by-column.
> >     When applied to non-numeric data mean() now returns NA rather
> >     than a confusing error message (for compatibility with S4).
> >
> >
> > which means:
> >
> > R> mean(c(TRUE, FALSE))
> > [1] NA
> > Warning message:
> > argument is not numeric: returning NA in: mean.default(c(TRUE, FALSE))
> >
> > but:
> >
> > R> sum(c(TRUE, FALSE))
> > [1] 1
> >
> > ?sum states:
> >
> >      sum(..., na.rm=FALSE)
> >
> > Arguments:
> >
> >      ...: numeric vectors.
> >
> > and clearly
> >
> > R> is.numeric(c(TRUE, FALSE))
> > [1] FALSE
> >
> >
> > this is confusing, isn't it? I think that `sum' and `mean' should take the
> > same arguments (and one probably will not allow to sum up logicals) or am
> > I missing something?
> >
> > Torsten
>
> Hmm. That slipped in without me noticing. Summing logicals is a fairly
> common practice, as in
>
> sem <- sd(x,na.rm=TRUE)/sqrt(sum(!is.na(x)))
>
> Taking means of logicals is somewhat more rare, but it does work in
> Splus 6.0 and it is a general rule coerce to logicals to 0/1, so I
> suspect that this is just an oversight and we want mean.default to
> start with
>
>     if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
>         warning("argument is not numeric: returning NA")
>         return(as.numeric(NA))
>
> If Brian really meant otherwise, he'll explain why when he gets back
> from Switzerland...

Back-compatibility.  In data frames logicals used to get coerced to
two-level factors, not to numerics, and then mean would fail for them.
If a logical is an experimental factor, sums make sense but means do not.

Here is what I find confusing (1.3.1).

> x <- c(TRUE, FALSE)
> mean(x)
[1] 0.5
> DF <- data.frame(x = x)
> mean(DF$x)
Error in Summary.factor(..., na.rm = na.rm) :
        "sum" not meaningful for factors

There 1.4.0 is consistent.

Change it if you like, but do think through all the implications of the
inconsistent treatment of logicals (for which I have not had time as yet).

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._