[R] Parallel computing with the snow package: external file I/O possible?

Martin Morgan mtmorgan at fhcrc.org
Tue Mar 14 01:20:57 CET 2006


Hi Scott --

It took me a bit to figure it out, but the help page for system made
it seem like system should return the exit status, rather than the
command result, if system is invoked without specifying intern =
TRUE. So why does system("hostname") actually print the host name?
it's a side-effect, representing the stdout of the system command
rather than the result of a function evaluation in R! Compare

> res <- system("hostname")
> res
0

You'll get the side effect printed to the screen, but the result
returned to R (invisibly, I guess) is the exit status -- 0.

Snow captures the return value, rather than the side effect. So the
solution is to use either

system("hostname", intern = TRUE )

or

Sys.info()[["nodename"]]

Hope that helps!

Martin

"Waichler, Scott R" <Scott.Waichler at pnl.gov> writes:

>  
> Hello,
>
> I am trying to do model autocalibration using the snow and rgenoud
> packages.  The function I want to run in task-parallel fashion across
> multiple machines is one that pre- and post-processes data and runs an
> external model code.  My problem is that external file I/O is happening
> only in the master node and not in the slaves.  I have followed Jasjeet
> Sekhon's suggestion to test the cluster setup, and that is fine:
>
>> library(snow)
>> 
>> #pick two machines
>> cl <- makeCluster(c("moab","escalante"))
>> 
>> clusterCall(cl, sin, 2)
>  
>> The output should be:
>> > clusterCall(cl, sin, 2)
>> [[1]]
>> [1] 0.9092974
>> 
>> [[2]]
>> [1] 0.9092974
>> 
>
> I do indeed get the above result, so I presume the network setup is ok.
> Next I tested a function that creates a file.  Here is the code that I
> sourced from the master ("moab"):
>
> # begin script
> library(snow)
>
> setDefaultClusterOptions(outfile="/tmp/cluster1")
> setDefaultClusterOptions(master="moab")
> cl <- makeCluster(c("moab", "escalante"), type="SOCK")
>
> # Define base pathname for output from my.test() 
> base.dir <- "./test"
>
> # Define a function that includes some file I/O 
> my.test <- function(base.dir) {
>   this.host <- as.character(system("hostname")) # to tag the node that
> makes the file
>   this.rnd <- sample(1:1e6, 1)  # to be 'sure' the files have different
> names
>   test.file <- paste(sep="", base.dir, "_", this.host, "_", this.rnd)
>   file.create(test.file)
> }  # end my.test()
>
> g <- clusterCall(cl, my.test, base.dir)
> print(g)
> stopCluster(cl)
> #  end script  
>
>
> The output (g) was as follows:
>  
> [[1]]
> [1] TRUE
>
> [[2]]
> [1] TRUE
>
> But there was only one file created, which I suspect is by the master
> node.  A second file was not created by the process on the slave.  Also,
> system("hostname") returns the number 0 for moab instead of the name.
> Any ideas as to what might be wrong?  
>
> Thanks,
> Scott Waichler
> scott.waichler _at_ pnl.gov
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list