[R] Is R the right choice for simulating first passage times of random walks?

Mon Aug 1 02:31:47 CEST 2011

Hi:

See if this works for you:

f4 <- function()
  {
     x <- sample(c(-1L,1L), 1)

      if (x >= 0 ) {return(1)} else {
           csum <- x
           len <- 1
               while(csum < 0) {
                   csum <- csum + sample(c(-1, 1), 1)
                   len <- len + 1
                                          }     }
      len
  }

# In one batch of repetitions of this function,
system.time(out <- replicate(1000, f4()))
   user  system elapsed
   0.51    0.00    0.52
> range(out)
[1]     1 17372

but in another (untimed), this took a significantly longer amount of
time to run [for obvious reasons]:
> range(out)
[1]      1 987752

For 100000 repetitions, I'd guess this could run anywhere from one to
several minutes, depending on the lengths of the sojourns encountered.
This looks like a reasonable way to visualize the output for 1000
replications:

hist(log(out), nclass = 20)

Notice that the function takes no arguments, returns the length of the
random walk while its cumulative sum is negative [or 1 if it starts
out positive], and then uses the replicate() function to iterate the
function f4() N times.

HTH,
Dennis

On Sun, Jul 31, 2011 at 4:36 PM, Paul Menzel
<paulepanter at users.sourceforge.net> wrote:
> Am Mittwoch, den 27.07.2011, 19:59 -0400 schrieb R. Michael Weylandt :
>> Some more skilled folks can help with the curve fitting, but the general
>> answer is yes -- R will handle this quite ably.
>
> Great to read that.
>
>> Consider the following code:
>>
>> <<-------------------------------------->>
>> n = 1e5
>> length = 1e5
>>
>> R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
>> R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
>> your life INFINITELY better
>> R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
>>
>> <<---------------------------------------->>
>>
>> There are actually even faster ways to do what you are asking for, but this
>> introduces you to some useful R architecture, above all the apply function.
>
> Thank you very much. I realized the the 0 column is not need when
> summing this up. Additionally I posted the wrong example code and I
> actually am only interested how long it stays negative from the
> beginning.
>
>> To see how long the longest stretch of negatives in each row is, the
>> following is a little sneaky but works pretty well:
>>
>> countNegative = apply(R,1,function(x){which.max(table(cumsum(x>=0))})
>>
>> then you can study these random numbers to do whatever with them.
>>
>> The gist of this code is that it counts how many positive number have been
>> seen up to each point: for any stretch this doesn't increase, you must be
>> negative, so this identifies the longest such stretch on each row and
>> records the length. (It may be off by one so check it on a smaller R matrix.
>
> That is a great example. It took me a while what `table()` does here but
> thanks to your explanation I finally understood it.
>
> […]
>
>> So all together:
>>
>> <<-------------------------------------->>
>> n = 1e3
>> length = 1e3
>>
>> R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n)
>> R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make
>> your life INFINITELY better
>> R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired.
>> fTemp <- function(x) {
>>     return(max(table(cumsum(x>=0))))
>> }
>> countNegative = apply(R,1,fTemp)
>> mu = mean(countNegative)
>> sig = sd(countNegative)/sqrt(length(countNegative))
>>
>> <<---------------------------------------->>
>>
>> This runs pretty fast on my laptop, but you'll need to look into the
>> memory.limit() function if you want to increase the simulation parameters.
>> There are much faster ways to handle the simulation as well, but this should
>> get you off to a nice start with R.
>>
>> Hope this helps,
>
> It did. Thank you again for the kind and elaborate introduction.
>
> Trying to run your example right away froze my system using `n = 1000`
> and `length = 1e5` [1]. So I really need to be careful how big such a
> matrix can get. One thing is to use integers as suggested in [2].
>
> My current code looks like the following.
>
> -------- 8< -------- code -------- >8 --------
> f4 <- function(n = 100000, # number of simulations
>               length = 100000) # length of iterated sum
> {
>        R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n)
>        R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better
>        fTemp <- function(x) {
>                if (x[1] >= 0 ) {
>                        return(1)
>                }
>
>                for (i in 1:length-1) {
>                        if (x[i] < 0 && x[i + 1] >= 0) {
>                                return(as.integer(i/2 + 2)) # simple random walks only hit 0 on even “times”
>                        }
>                }
>        }
>        countNegative = apply(R,2,fTemp)
>        tabulate(as.vector(countNegative), length)
> }
> -------- 8< -------- code -------- >8 --------
>
> 1.I could actually avoid `cumsum()` half the time, when the first entry
> is already positive. So I am still looking for a way to speed that up in
> comparison to a simple two loops scenario.
> 2. The counting of the length how long the walk stayed negative is
> probably also inefficient and I should find a better way on how to
> return the values.
>
> I am still thinking about both cases, but to come up with
> vectoriazations of the problem is quite hard.
>
> So I welcome any suggestions. ;-)
>
>
> Thanks,
>
> Paul
>
>
> [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832
> [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832#10
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>