[R] Quickest way to access data.frame?

Erik Iverson iverson at biostat.wisc.edu
Wed Apr 2 20:33:36 CEST 2008

Hello -

I am having a very hard time deciding what it is you want here.  Could 
you be more specific?  Given your 'data' data.frame, what do you want 
the output to be?

yoooooo wrote:
> Hi, I have tried search around this forum for the best way to access a
> data.frame.. i got the feeling that "no partial match" is the way to make it
> fast.... so I convert everything to factor.. but I'm still not 100% sure if
> the following code will do it... is this the fastest way to do something
> foreach (ID, ID2) pair? Thanks!

I doubt it.  Have a look at ?tapply, ?by, ?aggregate, and ?outer for 
possible approaches to what you might be trying to do.

Erik Iverson

> data <- data.frame(ID = floor(runif(n = 40000, min=0, max=20)),
>            ID2 = floor(runif(n = 40000, min=0, max=1000)), 
>            DATA1 = rnorm(n = 40000))
> data$ID <- as.factor(data$ID)
> data$ID2 <- as.factor(data$ID2)
> res <- lapply(attr(data$ID, "levels"), function(x, data){
>   data <- data[which(data$ID == x), ]
>   res <- structure(unlist(lapply(attr(data$ID2, "levels"), function(x,
> data){
>      data <- data[which(data$ID2 == x), ]
>      if (nrow(data) == 0){
>         return(NA)
>      }
>      data[nrow(data), "DATA1"]
>   }, data = data)), names=as.character(attr(data$ID2, "levels")))
>   res <- mean(res, na.rm=TRUE)
> }, data = data)

More information about the R-help mailing list