question about Rpvm, SNOW, etc.

Michael Na Li lina@u.washington.edu
Tue, 20 Aug 2002 12:30:22 -0700


>>>>> On Mon, 19 Aug 2002 15:02:22 -0400, "Liaw, Andy" <andy_liaw@merck.com> said:


Andy> 1.  Since each of these boxes has two CPUs, how do I spawn more than one
Andy> slave process on them?

Note unlike MPI, PVM has no access to hardware information like the number of
CPUs on each node.  So there is no restriction on how many tasks one can spawn
on the cluster.  More tasks may be desirable when some of them are less CPU
intensive jobs.  For instances, tasks that monitor the activity of the
network, report host or task failure and spawn new task or add new hosts, etc.

Andy> 2.  I was hoping I can see similar gain with randomForest, but that
Andy> doesn't seem to be the case:

>> system.time(iris.rf <- randomForest(iris[,1:4], iris[,5], ntree=10000))
Andy> [1] 8.52 1.00 9.61 0.00 0.00
>> system.time(cl.iris.rf <- clusterCall(cl, randomForest, iris[,1:4],
Andy> +                                       iris[,5], ntree=5000))
Andy> [1]  1.38  0.14 15.50  0.00  0.00

Andy> What am I missing here?  Is there anything I can do to see similar gain
Andy> as the boot() example?

I tried this example and found that most of the extra time is overhead, the
packing and unpacking of the messages.  When saving the object iris.rf, its
size is over 12M.  So it might be desirable to process the returned result in
each slave first and only return information needed.  

I got similar timing with our cluster.  Saving and loading the object to/from
a file require about 1.5 seconds each, which I assume is the cost of the
serialization (plus file reading and writing).  Then it seems the packing (as
bytes), transferring, and unpacking the object take 7-8 seconds??

I wonder how much the serialization itself hurts the performance.  Would
sending raw numbers with pvm routines improve the performance?

Michael

(BTW, is there a convenient function in R to examine the size of an object?)

-- 
---------------------------------------------------
Michael Na Li
Email: lina@u.washington.edu
Department of Biostatistics, Box 357232
University of Washington, Seattle, WA 98195  
---------------------------------------------------
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._