[Rd] speeding up perception

ivo welch ivo.welch at gmail.com
Sat Jul 2 20:23:01 CEST 2011


Dear R developers: R is supposed to be slow for iterative
calculations.  actually, it isn't.  matrix operations are fast.  it is
data frame operations that are slow.

R <- 1000
C <- 1000

example <- function(m) {
  cat("rows: "); cat(system.time( for (r in 1:R) m[r,20] <-
sqrt(abs(m[r,20])) + rnorm(1) ), "\n")
  cat("columns: "); cat(system.time(for (c in 1:C) m[20,c] <-
sqrt(abs(m[20,c])) + rnorm(1)), "\n")
  if (is.data.frame(m)) { cat("df: columns as names: ");
cat(system.time(for (c in 1:C) m[[c]][20] <- sqrt(abs(m[[c]][20])) +
rnorm(1)), "\n") }
}

cat("\n**** Now as matrix\n")
example( matrix( rnorm(C*R), nrow=R ) )

cat("\n**** Now as data frame\n")
example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) )


When m is a data frame, the operation is about 300 times slower than
when m is a matrix.    The program is basically accessing 1000
numbers.  When m is a data frame, the speed of R is about 20 accesses
per seconds on a Mac Pro.  This is pretty pathetic.

I do not know the R internals, so the following is pure speculation.
I understand that an index calculation is faster than a vector lookup
for arbitrary size objects, but it seems as if R relies on search to
find its element.  maybe there isn't even a basic vector lookup table.
 a vector lookup table should be possible at least along the dimension
of consecutive storage.  another possible improvement would be to add
an operation that adds an attribute to the data frame that contains a
full index table to the object for quick lookup.  (if the index table
is there, it could be used.  otherwise, R could simply use the
existing internal mechanism.)

I think faster data frame access would significantly improve the
impression that R makes on novices.  just my 5 cents.

/iaw
----
Ivo Welch (ivo.welch at gmail.com)
http://www.ivo-welch.info/
J. Fred Weston Professor of Finance
Anderson School at UCLA, C519



More information about the R-devel mailing list