[R] Alternate to for-loop
pburns at pburns.seanet.com
Mon Feb 16 21:47:09 CET 2009
Wacek Kusnierczyk wrote:
> Patrick Burns wrote:
>> If the goal is to "look" professional, then
>> 'replicate' probably suits. If the goal is to
>> compute as fast as possible, then that isn't
>> the case because 'replicate' is really a 'for'
>> loop in disguise and there are other ways.
>> Here's one other way:
>> function (size, replicates, distfun, ...)
>> colMeans(array(distfun(size * replicates, ...), c(size,
> a naive benchmark:
> f.rep = function(n, m) replicate(n, rnorm(m))
> f.pat = function(n, m) colMeans(array(rnorm(n*m), c(n, m)))
> system.time(f.pat(1000, 1000))
> system.time(f.rep(1000, 1000))
> makes me believe that there is no significant difference in efficiency
> between the 'professionally-looking' replicate-based solution and the
> 'as fast as possible' pat's solution.
I think Wacek is largely correct. First off, a correction:
the dimensions on the array if 'f.pat' should be c(m, n)
rather than c(n, m).
What I'm seeing on my machine is that the array trick seems
always to be a bit faster, but only substantially faster if 'm'
(that is, the number being summed) is smallish.
That makes sense: loops are "slow" because of the overhead
of doing the calling. When each call takes a lot of time,
the overhead becomes insignificant.
patrick at burns-stat.com
+44 (0)20 8525 0696
(home of "The R Inferno" and "A Guide for the Unwilling S User")
More information about the R-help