[R] Has For bucle be impooved in R

Thierry Onkelinx thierry.onkelinx at inbo.be
Mon Aug 7 16:57:26 CEST 2017


Dear Jesus,

The difference is marginal when each code chunk does the same things. Your
for loop does not yields the same output as the lapply. Here is the cleaned
version of your code.

n<-10000
set.seed(123)
x<-rnorm(n)
y<-x+rnorm(n)
rand.data<-data.frame(x,y)
k<-100
samples <- split(sample(n), rep(seq_len(k),length=n))

library(microbenchmark)
microbenchmark(
  "for" = {
    res <- vector("list", length(samples))
    for(index in seq_along(samples)) {
      fit <- lm(y~x, data = rand.data[-samples[[index]],])
      pred <- predict(fit, newdata = rand.data[samples[[index]],])
      res[[i]] <- ((pred - rand.data$y[samples[[index]]])^2)
    }
  },
  lapply = {
    cv.fold.fun <- function(index){
      fit <- lm(y~x, data = rand.data[-samples[[index]],])
      pred <- predict(fit, newdata = rand.data[samples[[index]],])
      return((pred - rand.data$y[samples[[index]]])^2)
    }
    lapply(seq_along(samples), cv.fold.fun)
  }
)

Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval cld
    for 866.4196 897.3137 949.8155 926.1918 946.8390 1767.463   100   a
 lapply 837.7804 889.6620 947.2401 909.9946 939.6379 2476.415   100   a

Best regards,


ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2017-08-07 16:48 GMT+02:00 Jeff Newmiller <jdnewmil op dcn.davis.ca.us>:

> The lapply loop and the for loop have very similar speed characteristics.
> Differences seen are almost always due to how you use memory in the body of
> the loop. This fact is not new. You may be under the incorrect assumption
> that using lapply is somehow equivalent to "vectorization", which it is not.
> --
> Sent from my phone. Please excuse my brevity.
>
> On August 7, 2017 7:29:58 AM PDT, "Jesús Para Fernández" <
> j.para.fernandez op hotmail.com> wrote:
> >Hi!
> >
> >I am doing a lapply and for comparaison and I get that for is faster
> >than lapply.
> >
> >
> >What I have done:
> >
> >
> >
> >n<-100000
> >set.seed(123)
> >x<-rnorm(n)
> >y<-x+rnorm(n)
> >rand.data<-data.frame(x,y)
> >k<-100
> >samples<-split(sample(1:n),rep(1:k,length=n))
> >
> >res<-list()
> >t<-Sys.time()
> >for(i in 1:100){
> >  modelo<-lm(y~x,rand.data[-samples[[i]]])
> >  prediccion<-predict(modelo,rand.data[samples[[i]],])
> >  res[[i]] <- (prediccion - rand.data$y[samples[[i]]])
> >
> >}
> >print(Sys.time()-t)
> >
> >Which takes 8.042 seconds
> >
> >and using Lapply
> >
> >cv.fold.fun <- function(index){
> >   fit <- lm(y~x, data = rand.data[-samples[[index]],])
> >   pred <- predict(fit, newdata = rand.data[samples[[index]],])
> >   return((pred - rand.data$y[samples[[index]]])^2)
> >  }
> >
> >
> >t<-Sys.time()
> >
> >nuevo<-lapply(seq(along = samples),cv.fold.fun)
> >print(Sys.time()-t)
> >
> >
> >Which takes 9.56 seconds.
> >
> >So... has been improved the FOR loop on R???
> >
> >Thanks!
> >
> >
> >
> >
> >
> >       [[alternative HTML version deleted]]
> >
> >______________________________________________
> >R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >https://stat.ethz.ch/mailman/listinfo/r-help
> >PLEASE do read the posting guide
> >http://www.R-project.org/posting-guide.html
> >and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help op r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list