[R] Alternate to for-loop

Wacek Kusnierczyk Waclaw.Marcin.Kusnierczyk at idi.ntnu.no
Tue Feb 17 09:44:30 CET 2009


Stefan Evert wrote:
> A couple of remarks on vQ's naive benchmark:
>
>> f.rep = function(n, m) replicate(n, rnorm(m))
>
> I suppose you meant
>
>     f.rep = function(n, m) replicate(n, mean(rnorm(m)))
>
> which doesn't make a substantial speed difference, though.

indeed, thanks;  i've already posted a correction, and as you say, it
doesn't make much difference for these particular benchmark values.


>
>
>> f.pat = function(n, m) colMeans(array(rnorm(n*m), c(n, m)))
>>
>> system.time(f.pat(1000, 1000))
>> system.time(f.rep(1000, 1000))
>>
>> makes me believe that there is no significant difference in efficiency
>> between the 'professionally-looking' replicate-based solution and the
>> 'as fast as possible' pat's solution.
>>
>
> True, I get the same timing results on my machine.  But then you
> should also point out that the original for-loop:
>
>     f.for = function(n, m) { res <- numeric(n); for (i in 1:n) res[i]
> <- mean(rnorm(m)); res }
>
> is exactly as fast as replicate().  So apart from "looking more
> professional", there isn't any difference between an explicit loop and
> replicate().

indeed, and pat seems correct in blaming for loops for the inefficiency
of replicate in cases where log10(n/m) > 2.

>
> Perhaps loops in R aren't always as slow (compared to matrix
> operations) as one seemed to think.

depends how and where you use them.  in the problem discussed here, they
do slow down the code for some class of inputs and do not speedup for
the others, compared to the array version of pat.

>   I ran into a similar issue with a simple benchmark the other day,
> where a plain loop in Lua was faster than vectorised code in R ...

hmm, would you be saying that r's vectorised performance is overhyped? 
or is it just that non-vectorised code in r is slow?


>
>
> I have to say, though, that like Patrick I assumed the goal was to
> obtain a large number of replicates for relatively small sets of
> random numbers, in which case the matrix solution is indeed faster
> (though not as much as I would have thought):
>
> > system.time(f.for(100000, 100))
>    user  system elapsed
>   4.212   0.025   4.273
> > system.time(f.rep(100000, 100))
>    user  system elapsed
>   4.109   0.028   4.172
> > system.time(f.pat(100000, 100))
>    user  system elapsed
>   1.580   0.134   1.739
>
>

sure;  it's even more pronounced when n = 10^6 and m=10.

vQ




More information about the R-help mailing list