[Rd] Rmpi_0.5-4 and OpenMPI questions

Dirk Eddelbuettel edd at debian.org
Thu Oct 4 13:49:47 CEST 2007


On 4 October 2007 at 06:37, Luke Tierney wrote:
| > Yes, my bad. But it also hangs with argument count=3 (which I had tried, but
| > my mail was wrong.)
| 
| Any chance the snow workers are picking up another version of Rmpi, eg
| a LAM one?  Might happen if you have R_SNOW_LIB set and a Rmpi
| installed there.  Otherwise starting with outfile=something may help.
| Let me know what you find out -- I'd like to make the snow
| configuration process more bullet-proof.

I generally don;t have any environment variables, so not sure. I'll try to
see what I can find.

| > | count=mpi.comm.size(0)-1 is used. If you start R alone, this will return
| > | count=0 since there is only one member (master). I do not know why snow
| > | did not use count=mpi.universe.size()-1 to find total nodes available.
| >
| > How would it know total nodes ?  See below re hostfile.
| >
| > | Anyway after using
| > | cl=makeMPIcluster(count=3),
| > | I was able to run parApply function.
| > |
| > | I tried
| > | R -> library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Also
| > | mpirun -host hostfile -np 1 R --no-save
| > | library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Hao
| > |
| > | PS: hostfile contains all nodes info so in R mpi.universe.size() returns
| > | right number and will spawn to remote nodes.
| >
| > So we depend on a correct hostfile ?   As I understand the Open MPI this is
| > deprecated:
| >
| > # This is the default hostfile for Open MPI.  Notice that it does not
| > # contain any hosts (not even localhost).  This file should only
| > # contain hosts if a system administrator wants users to always have
| > # the same set of default hosts, and is not using a batch scheduler
| > # (such as SLURM, PBS, etc.).
| >
| > I am _very_ interested in running Open MPI and Rmpi under slurm (which we
| > added to Debian as source package slurm-llnl) so it would be nice if this
| > could rewritten to not require a hostfile as this seems to be how upstream is
| > going.
| 
| To work better with batch scheduling environments where spawning might
| be techncally or politically problematic I have been trying to improve
| the RMPISNOW script that can be used with LAM as
| 
|      mpirun -np 3 RMPISNOW
| 
| and then either
| 
|      cl <- makeCluster()  # no argument
| 
| or
| 
|      cl <- makeCluster(2) # mpi rank - 1 (or less I believe)
| 
| (the default type for makeCluster becomes MPI in this case).  This
| seems to work reasonably well in LAM and I think I can get it to work
| similarly in OpenMPI -- will try in the next day or so.  Both LAM and
| OpenMPI provide environment variables so shell scripts can determine
| the mpirank, which is useful for getting --slave and output redirect
| to the workers.  I haven't figured out anything analogous for
| MPIC/MPICH2 yet.

Yes, out of a run I also realized that I can't just ask Rmpi to work without
a hostfile -- the info must come from somewhere.  

That said, it still fails with a minimal slurm example using the srun. Ie

edd at ron:~> cat /tmp/rmpi.r
#!/usr/bin/env r
library(Rmpi)
library(snow)
cl <- makeMPIcluster(count=1)
print("Hello\n")

does not make it through makeMPIcluster either and just hangs if I do:

edd at ron:~> srun -N 1 /tmp/rmpi.r
                                

Dirk


-- 
Three out of two people have difficulties with fractions.



More information about the R-devel mailing list