[R] compute variance of every column in a matrix without a loop

David Brahm brahm at alum.mit.edu
Mon Mar 18 16:28:51 CET 2002

Francisco J Molina <FJMolina at lbl.gov> wrote:
> Is it possible to compute the variance of every column in a matrix
> without a loop?

1) apply(m, 2, var)   (as several people said - simple, but not optimally fast)

2) colVars(m)    from package "colSums", available at:

3) In 1.5.0, I believe colSums and colMeans will be available (as fast C
   routines).  I don't know if Brian D. Ripley plans to include colVars too,
   but if not the definition is simple enough:
colVars <- function(x, na.rm=FALSE, dims=1, unbiased=TRUE, SumSquares=FALSE,
                    twopass=FALSE) {
  if (SumSquares) return(colSums(x^2, na.rm, dims))
  N <- colSums(!is.na(x), FALSE, dims)
  Nm1 <- if (unbiased) N-1 else N
  if (twopass) {x <- if (dims==length(dim(x))) x - mean(x, na.rm=na.rm) else
                     sweep(x, (dims+1):length(dim(x)), colMeans(x,na.rm,dims))}
  (colSums(x^2, na.rm, dims) - colSums(x, na.rm, dims)^2/N) / Nm1

With twopass=TRUE, this should act very much like the S-Plus function by the
same name.  (twopass=TRUE is needed for numerical accuracy if mu >> sigma.)

For a matrix (as opposed to a >2-dimensional array), with unbiased=FALSE, you'd
get the same result as Thomas Lumley's suggestion:
  R> colMeans(m*m)-colMeans(m)^2
However, this is *not* the same as apply(m, 2, var) because (sample) variances
are normally calculated with unbiased=TRUE (i.e. with N-1 in the denominator).
IF you have no NA's, then I'd modify Thomas's suggestion slightly:
  R> N <- nrow(m)
  R> N/(N-1) * (colMeans(m*m)-colMeans(m)^2)
                              -- David Brahm (brahm at alum.mit.edu)
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list