[R] variance/mean

(Ted Harding) Ted.Harding at manchester.ac.uk
Sun Mar 22 10:01:44 CET 2009


On 22-Mar-09 08:17:29, rkevinburton at charter.net wrote:
> At the risk of appearing ignorant why is the folowing true?
> 
> o <- cbind(rep(1,3),rep(2,3),rep(3,3))
> var(o)
>      [,1] [,2] [,3]
> [1,]    0    0    0
> [2,]    0    0    0
> [3,]    0    0    0
> 
> and
> 
> mean(o)
> [1] 2
> 
> How do I get mean to return an array similar to var? I would expect in
> the above example a vector of length 3 {1,2,3}.
> 
> Thank you for your help.
> Kevin

This is a consequence of (understandable) confusion about how var()
and mean() operate! It is not explicit, in "?var", that if you apply
var() to a matrix, as in your "var(o)" you get the covariance matrix
between the columns of 'o' -- except where it says (almost as an
aside) that "'var' is just another interface to 'cov'". Hence in
your example "var(o)" is equivalent to "cov(o)". Looked at in this
way, it is now straightforward to expect what you got.

This is, of course, different from what you would expect if you apply
var() to a vector, namely the variance of that series of numbers
(a single value).

On the other hand, mean() works differently. According to "?mean":
  Arguments:
     x: An R object.  Currently there are methods for numeric
        data frames, numeric vectors and dates.
  [...]
  Value:
     For a data frame, a named vector with the appropriate method
     being applied column by column.

which may have been what you expected. But a matrix is not a data
frame. Instead, it is an array, which (in effect) is a vector with
an attached "dimensions" attribute which tells R how to chop it up
into columns etc. -- whereas a data frame has its "by-column"
structure built in to it.

Now: "?mean" says nothing about matrices. Nothing whatever.
So you have to find out the hard way that mean(o) treats the array
'o' as a vector, ignoring its "dimensions" attribute. Hence you
get a single number, which is the mean of all the values in the
matrix.

In order to get what you are apparently looking for (the means of
the columns of 'o'), you could:

a) (the smooth way) use the apply() function, causing mean() to be
   applied to the second dimension (columns) of 'o':

   apply(o,2,mean)
   # [1] 1 2 3

b) (the heavy way) take a hint from "?mean" and feed it a data frame:

   mean(as.data.frame(o))
   # V1 V2 V3
   #  1  2  3 

Hoping this helps to clarify things!
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Mar-09                                       Time: 09:01:40
------------------------------ XFMail ------------------------------




More information about the R-help mailing list