[R] More than doubling performance with snow

Stefan Evert stefan.evert at uos.de
Mon Nov 24 16:12:45 CET 2008

> I'm sorry but I don't quite understand what "not running solve() in
> this process" means. I updated the code and it do show that the result
> from clusterApply() are identical with the result from lapply(). Could
> you please explain more about this?

The point is that a parallel processing framework like Snow and PVM  
does not execute the operation in your (interactive) R session, but  
rather starts separate computing processes that carry out the actual  
calculation (while your R session is just waiting for the results to  
become available).  These separate processes can either run on  
different computers in a network, or on your local machine (in order  
to make use of multiple CPU cores).

>>> user  system elapsed
>>> 0.584   0.144   4.355

>>> user  system elapsed
>>> 4.777   0.100   4.901

If you take a close look at your timing results, you can see that the  
total processing time ("elapsed") is only slightly shorter with  
parallelisation (4.35 s) than without (4.9 s).  You've probably been  
looking at "user" time, i.e. the amount of CPU time your interactive R  
session consumed.  Since with parallel processing, the R session  
itself doesn't perform the actual calculation (as explained above), it  
is mostly waiting for results to become available and "user" time is  
therefore reduced drastically.  In short, when measuring performance  
improvements from parallelisation, always look at the total "elapsed"  

So why isn't parallel processing twice as fast as performing the  
caculation in a single thread? Perhaps the advantage of using both CPU  
cores was eaten up by the communication overhead.  You should also  
take into account that a lot of other processes (terminals, GUI,  
daemons, etc.) are running on your computer at the same time, so even  
with parallel processing you will not have both cores fully available  
to R.  In my experience, there is little benefit in parallelisation as  
long as you just have two CPU cores on your computer (rather than,  
say, 8 cores).

Hope this clarifies things a bit (and is reasonably accurate, since I  
don't have much experience with parallelisation),

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]

More information about the R-help mailing list