[R] Inconsistency among mean, median, max, var

ggrothendieck@yifan.net ggrothendieck at yifan.net
Sun Mar 31 00:14:32 CET 2002


Don't get me wrong.  I think the R package is great and, in fact, am personally
investing time to learn it.  I particularly like its object oriented nature,
data frames (which nicely organize datasets) and the large and increasing set 
of packages and interfaces available for it.

I only mention my problems with it in hope it will lead to better more 
consistent software.  My comments are not a criticism.  They are 
helpful (hopefully) feedback.

Regarding specifically your query on what is wrong: its too complex and
concepts are not orthogonal.  Realistically its necessary to keep going back
to the documentation or test it out to figure out what these functions do
if you don't want to make a mistake.

You need a decision matrix like this one just to figure out what you are going 
to get.  

        ----- argument type ------
        matrix           dataframe

sum     single value     single value
max     single value     single value
median  single value     fails

mean    single value     columnwise
sd      columnwise       columnwise
var     varcov mat       varcov mat

My best try at summarizing this is to split it into two sets of rows 
as shown above with the following description:

- mean produces a single value on a matrix and acts columnwise on dataframes
- sd works columwise 
- var produces a variance covariance matrix
- others produce a single value except for median which fails on dataframes

It might be an idea to try out more functions just to see how other functions
fit in.

I use another statistical package in which the 12 corresponding functions have
a consistent result (work columnwise).


On 30 Mar 2002 at 20:25, ripley at stats.ox.ac.uk wrote:

> On Sat, 30 Mar 2002 ggrothendieck at yifan.net wrote:
> 
> > I found a strange inconsistency:
> 
> Well, these do work as documented, and I don't find it even ordinarily
> inconsistent.
> 
> > If m is a matrix and d is a data frame then
> >
> > - mean(m), median(m), max(m) and max(d) all return a single value
> >
> > but
> >
> > - mean(d) returns the column means
> > - median(d) fails
> > - both var(m) and var(d)  return the variance covariance matrix
> >
> > You pretty much have to experiment to figure this out since much of this
> > behavior is not readily obvious from the help files.
> 
> I don't think that is even 1% fair:
> 
> ?mean clearly says what it does for a data frame.
> ?median clearly says it only works for numeric vectors.
> ?var clearly says that it works for `a numeric vector, matrix or data
> frame'
> 
> Whatever is the problem with that?
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272860 (secr)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> 




-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list