[R] how to do away for loop using functionals?

Wed Oct 14 23:11:58 CEST 2015

Thank you Mike for looking into the problem and your helpful advice, really appreciate that.  Also, thank you Bill for pointing out the bad data.frame code.I modified the codes per your suggestions and run some time tests on n=2000 (increase # obs and groups as I actually have a much bigger dataset and more complicated getResult function with 100 lines of code)
original code:> proc.time() - ptm   user  system elapsed   72.37    0.06   72.51 
modified code:> proc.time() - ptm   user  system elapsed   73.21    0.20   81.26 
Surprisingly the lapply doesn't appear to save time, perhaps I should use dplyr function to extract groups but I tried that before and it didn't save time either.  I read that data.table is faster (if applicable) and perhaps I should go to that direction? Any thought of speeding the ops is much appreciated.Thank you,Anne

--------------------------------------------- CODE below #sample data setupn=2000; set.seed(1)g=rep(1:500,each=4)df=data.frame(s=sort(rnorm(mean=15,sd=10, n)), w=runif(n), h=rbinom(n, 1, 0.4) , g ); df      getResult(df)#i0=c(1,2,4,5,5)i0=rep(c(1,2,4,5,5),100)
ng= length(unique(g))
#initiation of result matrixA=B=matrix(Inf, ng,  ng); A## my code (Anne)ptm = proc.time()for(i in 1:ng) { #cat("i:",i," ") for(j in i0[i]:ng) { ok = !is.na(match(g,i:j)); #cat("j:",j,"\n");      A[i,j]=getResult(d=df[ok,]) } #end for (j)} #end for (i)proc.time() - ptm
## Mike's codeptm = proc.time()invisible(lapply(1:ng, function(i) {                      lapply(i0[i]:ng, function(j) {                                 ok <- !is.na(match(g, i:j))                                 B[i, j] <<- getResult(df[ok, ])                             })                 }))proc.time() - ptm

     On Wednesday, October 14, 2015 11:35 AM, William Dunlap <wdunlap at tibco.com> wrote:

 > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 0.4) , g ))

This is a lousy way to make a data.frame - the cbind forces all columns to be the sametype and forces them into one vector then as.data.frame splits them up into separate columnsagain.  You also get weird names for your columns.  If you want to make a data.frame, use   df <- data.frame(ColA = sort(rnorm(mean=15,sd=10, n)), ColB = runif(n), ColC = rbinom(n, 1, 0.4) , g = g)
However, since the columns you are passing to getResult are both numeric a matrix (madewith cbind) would work just as well and selecting rows from it will probably be faster. Youwill have to have a large number of groups before you notice the difference.

Bill Dunlap
TIBCO Software
wdunlap tibco.com
On Wed, Oct 14, 2015 at 2:02 AM, Michael Hannon <jmhannon.ucdavis at gmail.com> wrote:

I've done a simple-minded transliteration of your code into code using nested
lapply's.  I doubt that it buys you much in terms of performance (or even
clarity, which is really one of the main advantages of the `apply` family).

> A
        [,1]      [,2]     [,3]     [,4]     [,5]
[1,] 3.06097  6.507521 10.99610 12.05556 15.10388
[2,]     Inf 11.818495 15.85044 16.69465 19.70425
[3,]     Inf       Inf      Inf 19.14779 22.30343
[4,]     Inf       Inf      Inf      Inf 26.11170
[5,]     Inf       Inf      Inf      Inf 28.29882

> B
        [,1]      [,2]     [,3]     [,4]     [,5]
[1,] 3.06097  6.507521 10.99610 12.05556 15.10388
[2,]     Inf 11.818495 15.85044 16.69465 19.70425
[3,]     Inf       Inf      Inf 19.14779 22.30343
[4,]     Inf       Inf      Inf      Inf 26.11170
[5,]     Inf       Inf      Inf      Inf 28.29882
> all.equal(A, B)
[1] TRUE

If I happen to think of a more-elegant approach, I'll let you know.

-- Mike

Appendix: code
==============

###### Anne's code

getResult <- function(d) {

      #examplefunction

     weighted.mean(x=d[,1], w=d[,2])

}

#example data setup

n=20;

set.seed(1)

g=rep(1:5,each=4)

df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1,
0.4) , g )); df

getResult(df)

i0=c(1,2,4,5,5)

ng= length(unique(g))

#initiation of result matrix

A=matrix(Inf, ng, ng); A

for(i in 1:ng)

{              cat("i:",i,"")

                for(j in i0[i]:ng) {

                                ok= !is.na(match(g,i:j)); cat("j:",j,"\n");

                                A[i,j]=getResult(d=df[ok,])

                } #endfor (j)

} #end for (i)
A

###### Mike's code

n <- 20;
set.seed(1)
g <- rep(1:5,each=4)
df <- as.data.frame(cbind(sort(rnorm(mean=15,sd=10, n)),
                          runif(n),
                          rbinom(n, 1, 0.4),
                          g )); df
getResult(df)
i0 <- c(1,2,4,5,5)
ng <- length(unique(g))

B <- matrix(Inf, ng, ng);

invisible(lapply(1:ng, function(i) {
                     lapply(i0[i]:ng, function(j) {
                                ok <- !is.na(match(g, i:j))
                                B[i, j] <<- getResult(df[ok, ])
                            })
                 }))

B
all.equal(A, B)

On Mon, Oct 12, 2015 at 5:55 PM, Annie Hawk via R-help
<r-help at r-project.org> wrote:
> HI R-experts,
>
>
> I am trying to speed up my calculation of the A results below and replace the for loop withsome functionals like lapply.  After manyreadings, trial and error, I still have no success.  Would anyone please give me some hints onthat?
>
> Thank you in advance.
>
> Anne
>
>
> The program is this, I have a complicated function and itneeds to operate on some subsets of a dataset many times, depending on thevalues of group.  I simplify the functionand dataset for this example run.
>
> getResult <- function(d) {
>
>       #examplefunction
>
>      weighted.mean(x=d[,1], w=d[,2])
>
> }
>
>
>
> #example data setup
>
> n=20;
>
> set.seed(1)
>
> g=rep(1:5,each=4)
>
> df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n, 1, 0.4) , g )); df
>
> getResult(df)
>
> i0=c(1,2,4,5,5)
>
> ng= length(unique(g))
>
>
>
> #initiation of result matrix
>
> A=matrix(Inf, ng, ng); A
>
> for(i in 1:ng)
>
> {              cat("i:",i,"")
>
>                 for(jin i0[i]:ng) {
>
>                                 ok= !is.na(match(g,i:j)); cat("j:",j,"\n");
>
>                                 A[i,j]=getResult(d=df[ok,])
>
>                 } #endfor (j)
>
> } #end for (i)
>
> Is there an elegant way to remove the for loop here?  I try to make it flat for faster run but Icannot figure out how to subset the observations faster without error to apply the functiongetResult.  Any hint is appreciated.
>
>
>
>
>
> on another note, is there a more elegant way to initiate the list as follows?
>
> mylist=list(); w=rep(4,5)
>
> for (i in 1:5) mylist[[i]]=w[i:5]
>
>
>
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]