[R] Alternate to for-loop

Mon Feb 16 21:47:09 CET 2009

Wacek Kusnierczyk wrote:
> Patrick Burns wrote:
>   
>> If the goal is to "look" professional, then
>> 'replicate' probably suits.  If the goal is to
>> compute as fast as possible, then that isn't
>> the case because 'replicate' is really a 'for'
>> loop in disguise and there are other ways.
>>
>> Here's one other way:
>>
>> function (size, replicates, distfun, ...)
>> {
>>
>>        colMeans(array(distfun(size * replicates, ...), c(size,
>> replicates)))
>> }
>>     
>
> a naive benchmark:
>
> f.rep = function(n, m) replicate(n, rnorm(m))
> f.pat = function(n, m) colMeans(array(rnorm(n*m), c(n, m)))
>
> system.time(f.pat(1000, 1000))
> system.time(f.rep(1000, 1000))
>
> makes me believe that there is no significant difference in efficiency
> between the 'professionally-looking' replicate-based solution and the
> 'as fast as possible' pat's solution.
>   

I think Wacek is largely correct.  First off, a correction:
the dimensions on the array if 'f.pat' should be c(m, n)
rather than c(n, m).

What I'm seeing on my machine is that the array trick seems
always to be a bit faster, but only substantially faster if 'm'
(that is, the number being summed) is smallish.

That makes sense: loops are "slow" because of the overhead
of doing the calling.  When each call takes a lot of time,
the overhead becomes insignificant.

Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of "The R Inferno" and "A Guide for the Unwilling S User")
> vQ
>
>
>
>