[R] how to do away for loop using functionals?

William Dunlap wdunlap at tibco.com
Wed Oct 14 17:34:50 CEST 2015


> df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n,
1, 0.4) , g ))

This is a lousy way to make a data.frame - the cbind forces all columns to
be the same
type and forces them into one vector then as.data.frame splits them up into
separate columns
again.  You also get weird names for your columns.  If you want to make a
data.frame, use
   df <- data.frame(ColA = sort(rnorm(mean=15,sd=10, n)), ColB = runif(n),
ColC = rbinom(n, 1, 0.4) , g = g)

However, since the columns you are passing to getResult are both numeric a
matrix (made
with cbind) would work just as well and selecting rows from it will
probably be faster. You
will have to have a large number of groups before you notice the difference.


Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Oct 14, 2015 at 2:02 AM, Michael Hannon <jmhannon.ucdavis at gmail.com>
wrote:

> I've done a simple-minded transliteration of your code into code using
> nested
> lapply's.  I doubt that it buys you much in terms of performance (or even
> clarity, which is really one of the main advantages of the `apply` family).
>
>
> > A
>         [,1]      [,2]     [,3]     [,4]     [,5]
> [1,] 3.06097  6.507521 10.99610 12.05556 15.10388
> [2,]     Inf 11.818495 15.85044 16.69465 19.70425
> [3,]     Inf       Inf      Inf 19.14779 22.30343
> [4,]     Inf       Inf      Inf      Inf 26.11170
> [5,]     Inf       Inf      Inf      Inf 28.29882
>
> > B
>         [,1]      [,2]     [,3]     [,4]     [,5]
> [1,] 3.06097  6.507521 10.99610 12.05556 15.10388
> [2,]     Inf 11.818495 15.85044 16.69465 19.70425
> [3,]     Inf       Inf      Inf 19.14779 22.30343
> [4,]     Inf       Inf      Inf      Inf 26.11170
> [5,]     Inf       Inf      Inf      Inf 28.29882
> > all.equal(A, B)
> [1] TRUE
>
> If I happen to think of a more-elegant approach, I'll let you know.
>
> -- Mike
>
> Appendix: code
> ==============
>
> ###### Anne's code
>
> getResult <- function(d) {
>
>       #examplefunction
>
>      weighted.mean(x=d[,1], w=d[,2])
>
> }
>
> #example data setup
>
> n=20;
>
> set.seed(1)
>
> g=rep(1:5,each=4)
>
> df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n), rbinom(n,
> 1,
> 0.4) , g )); df
>
> getResult(df)
>
> i0=c(1,2,4,5,5)
>
> ng= length(unique(g))
>
>
>
> #initiation of result matrix
>
> A=matrix(Inf, ng, ng); A
>
> for(i in 1:ng)
>
> {              cat("i:",i,"")
>
>                 for(j in i0[i]:ng) {
>
>                                 ok= !is.na(match(g,i:j));
> cat("j:",j,"\n");
>
>                                 A[i,j]=getResult(d=df[ok,])
>
>                 } #endfor (j)
>
> } #end for (i)
> A
>
> ###### Mike's code
>
> n <- 20;
> set.seed(1)
> g <- rep(1:5,each=4)
> df <- as.data.frame(cbind(sort(rnorm(mean=15,sd=10, n)),
>                           runif(n),
>                           rbinom(n, 1, 0.4),
>                           g )); df
> getResult(df)
> i0 <- c(1,2,4,5,5)
> ng <- length(unique(g))
>
> B <- matrix(Inf, ng, ng);
>
> invisible(lapply(1:ng, function(i) {
>                      lapply(i0[i]:ng, function(j) {
>                                 ok <- !is.na(match(g, i:j))
>                                 B[i, j] <<- getResult(df[ok, ])
>                             })
>                  }))
>
> B
> all.equal(A, B)
>
>
> On Mon, Oct 12, 2015 at 5:55 PM, Annie Hawk via R-help
> <r-help at r-project.org> wrote:
> > HI R-experts,
> >
> >
> > I am trying to speed up my calculation of the A results below and
> replace the for loop withsome functionals like lapply.  After manyreadings,
> trial and error, I still have no success.  Would anyone please give me some
> hints onthat?
> >
> > Thank you in advance.
> >
> > Anne
> >
> >
> > The program is this, I have a complicated function and itneeds to
> operate on some subsets of a dataset many times, depending on thevalues of
> group.  I simplify the functionand dataset for this example run.
> >
> > getResult <- function(d) {
> >
> >       #examplefunction
> >
> >      weighted.mean(x=d[,1], w=d[,2])
> >
> > }
> >
> >
> >
> > #example data setup
> >
> > n=20;
> >
> > set.seed(1)
> >
> > g=rep(1:5,each=4)
> >
> > df=as.data.frame(cbind( sort(rnorm(mean=15,sd=10, n)),runif(n),
> rbinom(n, 1, 0.4) , g )); df
> >
> > getResult(df)
> >
> > i0=c(1,2,4,5,5)
> >
> > ng= length(unique(g))
> >
> >
> >
> > #initiation of result matrix
> >
> > A=matrix(Inf, ng, ng); A
> >
> > for(i in 1:ng)
> >
> > {              cat("i:",i,"")
> >
> >                 for(jin i0[i]:ng) {
> >
> >                                 ok= !is.na(match(g,i:j));
> cat("j:",j,"\n");
> >
> >                                 A[i,j]=getResult(d=df[ok,])
> >
> >                 } #endfor (j)
> >
> > } #end for (i)
> >
> > Is there an elegant way to remove the for loop here?  I try to make it
> flat for faster run but Icannot figure out how to subset the observations
> faster without error to apply the functiongetResult.  Any hint is
> appreciated.
> >
> >
> >
> >
> >
> > on another note, is there a more elegant way to initiate the list as
> follows?
> >
> > mylist=list(); w=rep(4,5)
> >
> > for (i in 1:5) mylist[[i]]=w[i:5]
> >
> >
> >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list