[Rd] extracting rows from a data frame by looping over the row names: performance issues

Herve Pages hpages at fhcrc.org
Sat Mar 3 03:03:57 CET 2007


Hi Greg,

Greg Snow wrote:
> Your 2 examples have 2 differences and they are therefore confounded in
> their effects.
> 
> What are your results for:
> 
> system.time(for (i in 1:100) {row <-  dat[i, ] })
> 
> 
> 

Right. What you suggest is even faster (and more simple):

  > mat <- matrix(rep(paste(letters, collapse=""), 5*300000), ncol=5)
  > dat <- as.data.frame(mat)

  > system.time(for (key in row.names(dat)[1:100]) { row <- dat[key, ] })
     user  system elapsed
   13.241   0.460  13.702

  > system.time(for (i in 1:100) { row <- sapply(dat, function(col) col[i]) })
     user  system elapsed
    0.280   0.372   0.650

  > system.time(for (i in 1:100) {row <-  dat[i, ] })
     user  system elapsed
    0.044   0.088   0.130

So apparently here extracting with dat[i, ] is 300 times faster than
extracting with dat[key, ] !

> system.time(for (i in 1:100) dat["1", ])
   user  system elapsed
 12.680   0.396  13.075

> system.time(for (i in 1:100) dat[1, ])
   user  system elapsed
  0.060   0.076   0.137

Good to know!

Thanks a lot,
H.



More information about the R-devel mailing list