[R] Is R the right choice for simulating first passage times of random walks?

Tue Aug 2 02:28:49 CEST 2011

Am Montag, den 01.08.2011, 12:43 -0400 schrieb R. Michael Weylandt :
> I've only got a 20 minute layover, but three quick remarks:
> 
> 1) Do a sanity check on your data size: if you want a million walks of
> a thousand steps, that already gets you to a billion integers to
> store--even at a very low bound of one byte each, thats already 1GB
> for the data and you still have to process it all and run the OS. If
> you bump this to walks of length 10k, you are in big trouble. 
> 
> Considered like that, it shouldn't surprise you that you are getting
> near memory limits. 
> 
> If you really do need such a large simulation and are willing to make
> the time/space tradeoff, it may be worth doing simulations in smaller
> batches (say 50-100) and aggregating the needed stats for analysis.

I already did that, saved the result and ran it again. I also found [1]
and will look to do these things in parallel, since the simulations do
not depend on each other. I hope I can avoid the matrix then and use
just `replicate()`.

> Also, consider direct use of the rm() function for memory management.

I was looking for such a feature the last days and read to set the
variables to `NULL` to delete them somewhere. Now I found the correct
command. Thank you!

> 2) If you know that which.max()==1 can't happen for your data, might
> this trick be easier than forcing it through some tricky logic inside
> the which.max()
> 
> X=which.max(...)
> if(X[1]==1) X=Inf # or whatever value

Noted for when I need that again.

> 3) I dont have any texts at hand to confirm this but isn't the
> expected value of the first hit time of a RW infinite?

That is indeed correct. The generating function of the first hitting
time of zero T₀ is (g_T₀)(s) ≔ 1/s (1 - \sqrt(1 - s). Therefore

(g_T₀)ʹ(s) ≔ 1/s² (1 - \sqrt(1 - s) + 1/s (2(1 - s))^(-½) → ∞ for s → 1.

> I think a  handwaving proof can be squeezed out of the optional
> stopping theorem with T=min(T_a,T_b) for a<0<b and let a -> -Inf.

Apparently there are several ways to prove that.

> If I remember right, this suggests you are trying to calculate a CI
> for a distribution with no finite moments, a difficult task to say the
> least.

I do not know. It scares me. ;-) For the symmetric simple random walk
S_n ≔ ∑_{i=0}^n X_i I want to verify (1).

(1)	n^(-½) ~ p_n ≔ P(max_{1 ≤ k ≤ n} S_n < 0)

a(x) ~ b(x) means that the quotient converges to 1 for x → ∞.

[…]

> PS - what's an iterated RW? This is all outside my field (hence my
> spitball on #2 above)

I am sorry, I meant *integrated* random walk [3][4]. Basically that is
the integral (“area” – can be negative).

	A_n ≔ ∑_{i=0}^n S_i = ∑_{i=0}^n (n - i + 1) X_i

Being 0, S₀ and X₀ can always be omitted. So I basically just need to do
one `cumsum()` more over the walks.

> PS2 - sorry about the row/column mix-up: I usually think of sample
> paths as rows...

No problem at all. I already was confused that it looked differently
(transposed) after the first `apply()`. But it made sense.

Thanks,

Paul

[1] http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf
[2] http://www.steinsaltz.me.uk/probA/ProbALN13.pdf
[3] http://www-stat.stanford.edu/~amir/preprints/irw.ps
[4] http://arxiv.org/abs/0911.5456
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110802/9ef937bf/attachment.bin>