[Rd] Inconsistent handling of data frames in min(), max(), and mean()

Gavin Simpson ucfagls at gmail.com
Thu Aug 21 20:32:31 CEST 2014


This inconsistency recently came to my attention:

> df <- data.frame(A = 1:10, B = rnorm(10))
> min(df)
[1] -1.768958
> max(df)
[1] 10
> mean(df)
[1] NA
Warning message:
In mean.default(df) : argument is not numeric or logical: returning NA

I recall the times where `mean(df)` would give `colMeans(df)` and this
behaviour was deemed inconsistent. It seems though that the change has
removed one inconsistency and replaced it with another.

Am I missing good reasons why there couldn't be a `mean.data.frame()`
method which worked like `max()` etc when given a data frame? Namely that
they return the required statistic *only* when presented with a data frame
of all numeric variables? E.g.

> df <- data.frame(A = 1:10, B = rnorm(10), C = factor(rep(c("A","B"), each
= 5)))
> max(df)
Error in FUN(X[[1L]], ...) :
  only defined on a data frame with all numeric variables

I would expect `mean(df)` to fail with the same error as for `max(df)` with
the new example `df` but for would return the same as `mean(as.matrix(df))`
or `mean(colMeans(df))` if given an entirely numeric data frame:

> mean(colMeans(df[, 1:2]))
[1] 2.78366
> mean(as.matrix(df[, 1:2]))
[1] 2.78366
> mean(df[,1:2])
[1] 2.78366

I just can't see the sense in having `mean` work the way it does now?

Thanks,

Gavin

-- 

Gavin Simpson, PhD

	[[alternative HTML version deleted]]



More information about the R-devel mailing list