[Rd] Rmpi_0.5-4 and OpenMPI questions
edd at debian.org
Thu Oct 4 13:49:47 CEST 2007
On 4 October 2007 at 06:37, Luke Tierney wrote:
| > Yes, my bad. But it also hangs with argument count=3 (which I had tried, but
| > my mail was wrong.)
| Any chance the snow workers are picking up another version of Rmpi, eg
| a LAM one? Might happen if you have R_SNOW_LIB set and a Rmpi
| installed there. Otherwise starting with outfile=something may help.
| Let me know what you find out -- I'd like to make the snow
| configuration process more bullet-proof.
I generally don;t have any environment variables, so not sure. I'll try to
see what I can find.
| > | count=mpi.comm.size(0)-1 is used. If you start R alone, this will return
| > | count=0 since there is only one member (master). I do not know why snow
| > | did not use count=mpi.universe.size()-1 to find total nodes available.
| > How would it know total nodes ? See below re hostfile.
| > | Anyway after using
| > | cl=makeMPIcluster(count=3),
| > | I was able to run parApply function.
| > |
| > | I tried
| > | R -> library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Also
| > | mpirun -host hostfile -np 1 R --no-save
| > | library(Rmpi) -> library(snow) -> c1=makeMPIcluster(3)
| > |
| > | Hao
| > |
| > | PS: hostfile contains all nodes info so in R mpi.universe.size() returns
| > | right number and will spawn to remote nodes.
| > So we depend on a correct hostfile ? As I understand the Open MPI this is
| > deprecated:
| > # This is the default hostfile for Open MPI. Notice that it does not
| > # contain any hosts (not even localhost). This file should only
| > # contain hosts if a system administrator wants users to always have
| > # the same set of default hosts, and is not using a batch scheduler
| > # (such as SLURM, PBS, etc.).
| > I am _very_ interested in running Open MPI and Rmpi under slurm (which we
| > added to Debian as source package slurm-llnl) so it would be nice if this
| > could rewritten to not require a hostfile as this seems to be how upstream is
| > going.
| To work better with batch scheduling environments where spawning might
| be techncally or politically problematic I have been trying to improve
| the RMPISNOW script that can be used with LAM as
| mpirun -np 3 RMPISNOW
| and then either
| cl <- makeCluster() # no argument
| cl <- makeCluster(2) # mpi rank - 1 (or less I believe)
| (the default type for makeCluster becomes MPI in this case). This
| seems to work reasonably well in LAM and I think I can get it to work
| similarly in OpenMPI -- will try in the next day or so. Both LAM and
| OpenMPI provide environment variables so shell scripts can determine
| the mpirank, which is useful for getting --slave and output redirect
| to the workers. I haven't figured out anything analogous for
| MPIC/MPICH2 yet.
Yes, out of a run I also realized that I can't just ask Rmpi to work without
a hostfile -- the info must come from somewhere.
That said, it still fails with a minimal slurm example using the srun. Ie
edd at ron:~> cat /tmp/rmpi.r
cl <- makeMPIcluster(count=1)
does not make it through makeMPIcluster either and just hangs if I do:
edd at ron:~> srun -N 1 /tmp/rmpi.r
Three out of two people have difficulties with fractions.
More information about the R-devel