[R] parallel r job on sun gridengine

mark garey garey at biostat.ucsf.edu
Thu Mar 24 01:27:34 CET 2005


greetings all,
this may be the wrong forum for my problem - if so please advise.
i am addressing this list because of an error i am getting from the snow
library rmpi (i think) after lam has booted the mpi nodes

i have a script (provided by a faculty member - i am not an R user but  
have the task
of making it run scucessfully as a batch job on the gridengine) that  
runs with success
as an interactive shell script, can be run interactively using qrsh on  
a sun gridengine,
but fails when submitted to the gridengine as a batch job. the lam/mpi  
nodes boot and
shutdown properly via a parallel environment defined in the gridengine.
where the job falls flat is when the snow RMPInode.sh script is called -
or so it seems. the error generated is:
___
/usr/local/lib/R.framework/Versions/2.0.0/Resources/library/snow/ 
RMPInode.sh: line 9: 13465 Trace/BPT trap          (core dumped)  
${RPROG:-R} --vanilla  >${OUT:-/dev/null} 2>&1 <<EOF

library(Rmpi)
library(snow)

runMPIslave()
EOF
___

environment is darwin (panther 10.3.8), r version is 2.0.0, gridengine  
version is 5.3.

i get the feeling this is not an r problem, but if you used r in batch  
mode in a parallel environment
maybe you could point me in the right direction.i also realize that  
many factors could contibute to this
error, but to be able to rule out r (or the snow library) would be  
helpful.

thanks in advance,

mark+ \ ucsf biostat

--
mark garey
ucsf department of epidemiology and biostatistics
500 parnassus ave, mu420w
san francisco, ca. 94143
415-502-8870




More information about the R-help mailing list