[R] manual parallel processing

jgarcia at ija.csic.es jgarcia at ija.csic.es
Thu Nov 22 11:28:58 CET 2007


Hi;
I have a R script that includes a call to genoud(); genoud process lasts
about 4 seconds, what would be OK if I hadn't have to call it about 2000
times. This yields about 2 hours of processing.
And I would like to use this script operationally; so that it should be
run twice a day. It seems to me that the parallel processing option
included in genoud() divides the task inside the function among the
computers included in the cluster. On the other hand, my consecutive calls
to genoud() are independent of each other, but all depend on objects
stored in the R workspace. I think that communication time among computer
for a 4 second task, repeated 2000 times should be slower that to divide
the calls to genoud among the number of available computers. So, perhaps a
viable option to speed up the process could be something as:

1) Somehow make a copy from the workspace on the fly (I mean put some
command, before the loop that call genoud(), to export the workspace in
its actual state to other computers)
2) divide the task in the number of available computers in the
network;e.g, if I've got my "localhost" and 3 computers more:
n.comp  <- 4
nsteps  <- 1987
steps.c <- trunc(nsteps/n.comp)
steps.c <- (1:n.comp)*steps.c
steps.c <-  c(steps.c[1:(n.comp-1)],nsteps)
steps.i <- c(1,steps.c[-n.comp]+1)
for(ic in 1:n.comp){
 Somehow start remotely R, read the copied workspace and execute in
computer ic for(i in
steps.i[ic]:steps.c[ic]){something[i];genoud(f(i));somethin.else[i]}
 and somehow get back results from ic
}
3) concacenate results in my "localhost" workspace

You can see I'm rather lost with this. Could you help with this?

Regards,
Javier



More information about the R-help mailing list