[R] Lookups in R

Michael Frumin michael at frumin.net
Thu Jul 5 11:56:20 CEST 2007


the problem I have is that userid's are not just sequential from
1:n_users.  if they were, of course I'd have made a big matrix that was
n_users x n_fields and that would be that.  but, I think what I cando is
just use the hash to store the index into the result matrix, nothing
more. then the rest of it will be easy.

but please tell me more about eliminating loops.  In many cases in R I
have used lapply and derivatives to avoid loops, but in this case they
seem to give me extra overhead simply by the generation of their result
lists:

> system.time(lapply(1:10^4, mean))
   user  system elapsed 
   1.31    0.00    1.31 
> system.time(for(i in 1:10^4) mean(i))
   user  system elapsed 
   0.33    0.00    0.32 


thanks,
mike


> I don't think that's a fair comparison--- much of the overhead comes
> from the use of data frames and the creation of the indexing vector. I
> get
> 
> > n_accts <- 10^3
> > n_trans <- 10^4
> > t <- list()
> > t$amt <- runif(n_trans)
> > t$acct <- as.character(round(runif(n_trans, 1, n_accts)))
> > uhash <- new.env(hash=TRUE, parent=emptyenv(), size=n_accts)
> > for (acct in as.character(1:n_accts)) uhash[[acct]] <- list(amt=0, n=0)
> > system.time(for (i in seq_along(t$amt)) {
> +     acct <- t$acct[i]
> +     x <- uhash[[acct]]
> +     uhash[[acct]] <- list(amt=x$amt + t$amt[i], n=x$n + 1)
> + }, gcFirst = TRUE)
>    user  system elapsed
>   0.508   0.008   0.517
> > udf <- matrix(0, nrow = n_accts, ncol = 2)
> > rownames(udf) <- as.character(1:n_accts)
> > colnames(udf) <- c("amt", "n")
> > system.time(for (i in seq_along(t$amt)) {
> +     idx <- t$acct[i]
> +     udf[idx, ] <- udf[idx, ] + c(t$amt[i], 1)
> + }, gcFirst = TRUE)
>    user  system elapsed
>   1.872   0.008   1.883
> 
> The loop is still going to be the problem for realistic examples.
> 
> -Deepayan



More information about the R-help mailing list