[R] speed issue in simulating a stochastic process

Thu Nov 6 23:22:15 CET 2014

Loops are not slow, but your code did a lot of unneeded operations in each
loop.
E.g, you computed
    D$id==i & D$t==t
for each row of D.  That involves 2*nrow(D) equality tests for each of the
nrow(D)
rows, i.e., it is quadratic in N*T.

Then you did a data.frame replacement operation
    D[k,]$y <- newValue
where k is D$id==1&D$t==t.  This extracts the k'th row of D, then extracts
the 1-row 'y' column
from it, replaces it with the new value, then puts that row back into D.
If you must use
a data.frame, the equivalent
   D$y[k] <- newValue
is probably much faster (data.frames are lists of columns, so replacing a
column is fast).

Using a matrix to organize things is less flexible, but faster because you
don't have to search
when you want to find the element for a given id and time - you just do a
little arithmetic to
get the offset from the start of the matrix.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Thu, Nov 6, 2014 at 2:05 PM, Matteo Richiardi <matteo.richiardi at gmail.com
> wrote:

> Hi William,
> that's super. Thanks a lot. I knew that R is slow with loops, but did not
> imagine so slow! B.t.w., what's the reason?
> Final question: in your code you have mean(M[t-1L,]): what is the 'L'
> for? I removed it at apparently the code produces the same output...
>
> Thanks again,
> Matteo
>
> On 6 November 2014 18:46, William Dunlap <wdunlap at tibco.com> wrote:
>
>> I find that representing the simulated data as a T row by N column matrix
>> allows for a clearer and faster simulation function.  E.g., compare the
>> output of the following two functions, the first of which uses your code
>> and the second a matrix representation (which I convert to a data.frame at
>> the end so I can compare outputs easily).  I timed both of them for T=10^3
>> times and N=50 individuals; both gave the same results and f1 was 10000
>> times faster than f0:
>>   > set.seed(1); t0 <- system.time(s0 <- f0(N=50,T=1000))
>>   > set.seed(1); t1 <- system.time(s1 <- f1(N=50,T=1000))
>>   > rbind(t0, t1)
>>      user.self sys.self elapsed user.child sys.child
>>   t0    436.87     0.11  438.48         NA        NA
>>   t1      0.04     0.00    0.04         NA        NA
>>   > all.equal(s0, s1)
>>   [1] TRUE
>>
>> The functions are:
>>
>> f0 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0)
>> {
>>   # construct the data frame and initialize y
>>   D <- data.frame(
>>     id = rep(1:N,T),
>>     t = rep(1:T, each = N),
>>     y = rep(y0,N*T)
>>   )
>>
>>   # update y
>>   for(t in 2:T){
>>     ybar.L1 = mean(D[D$t==t-1,"y"])
>>     for(i in 1:N){
>>       epsilon = rnorm(1,mean=m_e,sd=sd_e)
>>       D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon
>>     }
>>   }
>>   D
>> }
>>
>> f1 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0)
>> {
>>   # same process simulated using a matrix representation
>>   #   The T rows are times, the N columns are individuals
>>   M <- matrix(y0, nrow=T, ncol=N)
>>   if (T > 1) for(t in 2:T) {
>>     ybar.L1 <- mean(M[t-1L,])
>>     epsilon <- rnorm(N, mean=m_e, sd=sd_e)
>>     M[t,] <- lambda * y0 + (1-lambda)*ybar.L1 + epsilon
>>   }
>>   # convert to the data.frame representation that f0 uses
>>   tM <- t(M)
>>   data.frame(id = as.vector(row(tM)), t = as.vector(col(tM)), y =
>> as.vector(tM))
>> }
>>
>>
>>
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>>
>> On Thu, Nov 6, 2014 at 6:47 AM, Matteo Richiardi <
>> matteo.richiardi at gmail.com> wrote:
>>
>>> I wish to simulate the following stochastic process, for i = 1...N
>>> individuals and t=1...T periods:
>>>
>>> y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t}
>>>
>>> where Ey_{t-1} is the average of y over the N individuals computed at
>>> time
>>> t-1.
>>>
>>> My solution (below) works but is incredibly slow. Is there a faster but
>>> still clear and readable alternative?
>>>
>>> Thanks a lot. Matteo
>>>
>>> rm(list=ls())
>>> library(plyr)
>>> y0 = 0
>>> lambda = 0.1
>>> N = 20
>>> T = 100
>>> m_e = 0
>>> sd_e = 1
>>>
>>> # construct the data frame and initialize y
>>> D = data.frame(
>>>   id = rep(1:N,T),
>>>   t = rep(1:T, each = N),
>>>   y = rep(y0,N*T)
>>> )
>>>
>>> # update y
>>> for(t in 2:T){
>>>   ybar.L1 = mean(D[D$t==t-1,"y"])
>>>   for(i in 1:N){
>>>     epsilon = rnorm(1,mean=m_e,sd=sd_e)
>>>     D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon
>>>   }
>>> }
>>>
>>> ybar <- ddply(D,~t,summarise,mean=mean(y))
>>>
>>> plot(ybar, col = "blue", type = "l")
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>

	[[alternative HTML version deleted]]