[R] applying data generating function

Gabor Grothendieck ggrothendieck at myway.com
Mon Mar 8 07:46:21 CET 2004



Its possible that there was a garbage collection at the
beginning or maybe this suggestion does not apply, given
the precautions you took.  As far as I know, all you can
do is try it and see if it gives more consistent results.

---
Date:   Sun, 07 Mar 2004 20:15:41 -0800 
From:   Spencer Graves <spencer.graves at pdf.com>
To:   <ggrothendieck at myway.com> 
Cc:   <p.dalgaard at biostat.ku.dk>, <phddas at yahoo.com>, <r-help at stat.math.ethz.ch> 
Subject:   Re: [R] applying data generating function 

 
Hi, Gabor: 

Thanks for the "garbage collection" suggestion. In this case, I 
can't imagine how it would change the results: I developed the script 
in an S-Plus script window, then copied it into an R session that had 
recently just been started. Moreover, the times generally declined upon 
replication. Do you think the time might INCREASE after "gc"? 

Best Wishes,
spencer graves

Gabor Grothendieck wrote:

>Regarding your comment on speed varying when replicating the
>runs, try running gc() first.
>
>---
>Date: Sun, 07 Mar 2004 17:56:46 -0800 
>From: Spencer Graves <spencer.graves at pdf.com>
>To: Peter Dalgaard <p.dalgaard at biostat.ku.dk> 
>Cc: Fred J. <phddas at yahoo.com>,r-help <r-help at stat.math.ethz.ch> 
>Subject: Re: [R] applying data generating function 
>
> 
>Peter's enumeration of alternatives inspired me to compare compute 
>times for N = 10^(2:5), with the following results: 
>
>*** R 1.8.1 under Windows 2000, IBM Thinkpad T30: 
>10 100 1000 10000 1e+05
>for loop 0 0.01 0.09 1.27 192.05
>gen e + for loop 0 0.00 0.03 0.22 2.58
>create storage + for loop 0 0.01 0.05 0.34 3.45
>sapply 0 0.00 0.04 0.28 3.82
>replicate 0 0.01 0.05 0.29 4.02
>
>I repeated this with the "for loop" both first and last. The 
>times tended to decline on replication, with the "for loop" time for N = 
>1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop" 
>last), and 118.64 ("for loop" first again). 
>
>Conclusions: 
>
>(1) Apparently, in some cases, R picks up speed upon replication
>
>(2) The first 3 times for the "for loop" with N = 1e5 made me 
>wonder if there was an order effect, with the "for loop" being longer in 
>the first position. However, the last run with the "for loop" again 
>first had the shortest time of 118.64, contradicting that hypothesis. 
>
>By comparison, I also tried this under S-Plus 6.2: 
>
>*** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first): 
>10 100 1000 10000 100000
>for loop 0.01 0.05 0.331 3.976 273.073
>gen e + for loop 0.00 0.04 0.320 3.154 29.112
>create storage + for loop 0.01 0.03 0.231 2.113 22.242
>sapply 0.00 0.04 0.380 4.757 23.003
>
>The script I used appears below. As Peter said, "the only really 
>crucial [issue] is to avoid the inefficient append by preallocating" the 
>vectors to be generated. Moreover, this is only an issue for long loop, 
>with a threshold of between 1e4 and 1e5 in this example. For shorter 
>loops, the programmers' time is far more valuable. 
>
>Enjoy. spencer graves
>####################
>
>
>N.gen <- c(10, 100, 1000, 10000, 1e5)
>mtds <- c("for loop", "gen e + for loop", "create storage + for loop",
>"sapply", "replicate")
>m <- length(N.gen) 
>ellapsed.time <- array(NA, dim=c(m, length(mtds)))
>dimnames(ellapsed.time) <- list(N.gen, mtds)
>
>for(iN in 1:m){
>cat("\n", iN, "")
>N <- N.gen[iN]
>#for loop
>set.seed(123)
>start.time <- proc.time()
>f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) }
>v=c()
>x=.1 # starting point
>for (i in 1:N) { x=f(x); v=append(v,x) }
>ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3] 
>cat(mtds[1], "")
>
>#gen e + for loop
>set.seed(123)
>start.time <- proc.time()
>e <- 0.001*rnorm(N)
>X <- rep(0.1, N+1)
>for(i in 2:(N+1))
>X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1])
>ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3]
>cat(mtds[2], "")
>
>#create storage + for loop 
>set.seed(123)
>start.time <- proc.time()
>V <- numeric(N)
>xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv }
>ellapsed.time[iN, "create storage + for loop"] <- 
>(proc.time()-start.time)[3]
>cat(mtds[3], "")
>
>#sapply
>set.seed(123)
>start.time <- proc.time()
>xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa))
>ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3] 
>cat(mtds[4], "")
>
>if(!is.null(version$language)){
>#replicate
>set.seed(123)
>start.time <- proc.time()
>z <- .1 ; vr <- replicate(N, z <<- f(z))
>ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3]
>cat(mtds[5], "")
>}
>
>}
>
>t(ellapsed.time)
>#############################
>Peter Dalgaard wrote:
>
> 
>
>>Christophe Pallier <pallier at lscp.ehess.fr> writes:
>>
>>
>>
>> 
>>
>>>Fred J. wrote:
>>>
>>>
>>>
>>> 
>>>
>>>>I need to generate a data set based on this equation
>>>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a
>>>>N(0,0,001) random variable
>>>>I need say 100 values.
>>>>
>>>>How do I do this?
>>>>
>>>>
>>>> 
>>>>
>>>I assume X(t) and x(t) are the same (?).
>>>
>>>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) }
>>>v=c()
>>>x=.1 # starting point
>>>for (i in 1:100) { x=f(x); v=append(v,x) }
>>>
>>>There may be smarter ways...
>>>
>>>
>>> 
>>>
>>Yes, but the only really crucial one is to avoid the inefficient append by
>>preallocating the v: 
>>
>>v <- numeric(100)
>>x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x }
>>
>>apart from that you can use implicit loops:
>>
>>x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x))
>>
>>or
>>
>>z <- .1 ; v <- replicate(100, z <<- f(z))
>>
>>(You cannot use x there because of a variable capture issue which is a
>>bit of a bug. I intend to fix it for 1.9.0.)
>>




More information about the R-help mailing list