[R] snow's makeCluster hanging (using Rmpi)

Randall C Johnson [Contr.] rjohnson at ncifcrf.gov
Tue Nov 7 19:28:07 CET 2006


On 11/7/06 11:28 AM, "Ramon Diaz-Uriarte" <rdiaz at cnio.es> wrote:

> On Tuesday 07 November 2006 15:56, Randall C Johnson [Contr.] wrote:
>> Hello everyone,
>> I've been fiddling around with the snow and Rmpi packages on my new Intel
>> Mac, and have run into a few problems. When I make a cluster on my machine,
>> both slaves start up just fine, and everything works as expected. When I
>> try to make a cluster including another networked machine it hangs. I've
>> followed the suggestions at
>> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/83086.html and
>> http://www.stat.uiowa.edu/~luke/R/cluster/cluster.html but to no avail.
>> Everything seems to start up fine using lamboot, but then hangs when making
>> the cluster in R. Making a cluster with 2 slaves seems to work fine, but if
>> I increase the number (to use the networked machines) it hangs again.
>> 
>> I've tried networking to another Mac, and also to a machine running Red Hat
>> Linux. Both machines can set up their own local clusters. Does anyone have
>> any ideas?
> 
> Dear Randy,
> 
> A few suggestions:
> 
> a) make sure there are no firewalls; I assume this is actually the case, but
> anyway;

I don't think I have any firewalls running. I checked and they all seem to
be disabled...
 
> b) what happens if you lamboot outside R (and create a universe with a local
> and a networked machine) and then you do: "lamexec -np 6 hostname"?

This prints out the host names of each machine as expected.
 
> c) are the Rmpi and snow installed in the same directories in the different
> machines? are there version differences in Rmpi (or Snow) between machines?

I've installed the same versions, but they are in different directories...

I also tried an example per Luke Tierney's suggestion using only Rmpi, and I
get the following error when trying to spawn the Rslaves after starting up
with lamboot (outside of R). I tried to use laminfo, but I'm not sure what
I'm looking for or how to use the information given...

> library(Rmpi)
> mpi.spawn.Rslaves()
----------------------------------------------------------------------------

It seems that [at least] one of the child processes that was started
by MPI_Comm_spawn* chose a different RPI than the parent MPI
application.  For example, one (of the) child process(es) that
differed from the parent is shown below:

    Parent application: MPI_Comm_spawn
    Child MPI_COMM_WORLD rank usysv (v7.1.0): 0

All MPI processes must choose the same RPI module and version when
they start.  Check your SSI settings and/or the local environment
variables on each node.
----------------------------------------------------------------------------
R(26444) malloc: ***  Deallocation of a pointer not malloced: 0x16379a0;
This could be a double free(), or free() called with the middle of an
allocated block; Try setting environment variable MallocHelp to see tools to
help debug
Error in mpi.comm.spawn(slave = system.file("Rslaves.sh", package = "Rmpi"),
: 
    MPI_Error_string: unclassified

 
> 
> HTH,
> 
> R.
> 
> 
> 
>> 
>> Thanks,
>> Randy
>> 
>>> sessionInfo()
>> 
>> R version 2.4.0 Patched (2006-10-03 r39576)
>> i386-apple-darwin8.8.2
>> 
>> locale:
>> C
>> 
>> attached base packages:
>> [1] "methods"   "stats"     "graphics"  "grDevices" "utils"     "datasets"
>> [7] "base"
>> 
>> other attached packages:
>>    Rmpi    snow
>> "0.5-3" "0.2-2"
>> 
>>



More information about the R-help mailing list