[Rd] after some time R stopped returning from Rmpi calls

Sklyar, Oleg (London) osklyar at maninvestments.com
Thu Jan 29 16:09:07 CET 2009


Hi,

this is not exactly a developer question, but maybe you have noticed
similar behaviour before. For quite some time R and Rmpi were working
perfectly for me until one day they just stopped doing so without any
changes in the configs. R still spawns jobs as requested, and if they
are small they run through and return, but as soon as their duration is
over 5s or so the spawned processes go to sleep and never return to the
head node. Below is the top of one of the slave nodes with the spawned
jobs, as you see their status is sleeping. It looks like a communication
problem between the master and the slave nodes, but this behaviour *is*
user specific: exactly the same script will work for some users and will
just lead to hanging for others.

Rmpi is installed with a default R CMD INSTALL without additional
arguments. LD_LIBRARY_PATH is set and the whole setup *was* working with
the same config. 

Has anybody experienced similar problems with Rmpi and LAM before?

Thank you,
Oleg

RHEL 5 x86_64, 16core Opteron

LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University

It is quite a dated version of R I running now, but recent Rmpi.

> sessionInfo()
R version 2.9.0 Under development (unstable) (2008-09-30 r46585) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] stats     graphics  utils     datasets  grDevices methods   base


other attached packages:
[1] Rmpi_0.5-5

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

 7699 osklyar   16   0 19128 1448 1000 S    0  0.0   0:00.02 lamd

 7807 osklyar   16   0  8652  992  824 S    0  0.0   0:00.01 Rslaves.sh

 7808 osklyar   16   0  8656  992  824 S    0  0.0   0:00.01 Rslaves.sh

 7809 osklyar   16   0  8652  992  824 S    0  0.0   0:00.00 Rslaves.sh

 7810 osklyar   17   0  8656  992  824 S    0  0.0   0:00.01 Rslaves.sh

 7811 osklyar   18   0  8656  992  824 S    0  0.0   0:00.02 Rslaves.sh

 7812 osklyar   18   0  8656  992  824 S    0  0.0   0:00.02 Rslaves.sh

 7813 osklyar   18   0  8656  992  824 S    0  0.0   0:00.02 Rslaves.sh

 7814 osklyar   18   0  8656  992  824 S    0  0.0   0:00.02 Rslaves.sh

 7815 osklyar   15   0  165m  60m 4568 S    0  0.2   0:03.66 R

 7816 osklyar   16   0  161m  56m 4568 S    0  0.2   0:03.51 R

 7817 osklyar   15   0  161m  56m 4584 S    0  0.2   0:03.82 R

 7818 osklyar   16   0  161m  56m 4568 S    0  0.2   0:03.31 R

 7819 osklyar   16   0  165m  61m 4568 S    0  0.2   0:03.59 R

 7820 osklyar   15   0  162m  58m 4568 S    0  0.2   0:03.43 R

 7821 osklyar   16   0  162m  58m 4568 S    0  0.2   0:03.26 R

 7824 osklyar   16   0  161m  56m 4568 S    0  0.2   0:03.49 R

 7973 osklyar   15   0 87208 1880 1140 S    0  0.0   0:00.00 sshd

 7974 osklyar   15   0 72332 1716 1276 S    0  0.0   0:00.01 bash


Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
osklyar at maninvestments.com

**********************************************************************
Please consider the environment before printing this email or its attachments.
The contents of this email are for the named addressees ...{{dropped:19}}



More information about the R-devel mailing list