[BioC] Bioconductor AMI Amazon EC2 help

Dan Tenenbaum dtenenba at fhcrc.org
Wed Apr 20 07:00:36 CEST 2011


Hi David,


On Tue, Apr 19, 2011 at 9:09 PM, David Gibbs <gibbsd at ohsu.edu> wrote:
> Hi there,
>
> I'm a student working on a project with some Rmpi code that I would love to run on more nodes,
> but I'm having trouble with the Bioconductor image.
>
> I'm following the directions for spinning up a cluster found here: http://www.bioconductor.org/help/bioconductor-cloud-ami/
> Once I have the cluster up with 3 nodes, I run the mpiTest.R script... it gets to the spawning function, then hangs.
> After about half an hour I killed it.  Anyone getting this to work?  Any hints?  See output below.  Thanks!
>
> ...
> ...
> Creating volume...
> I, [2011-04-20T03:31:10.592881 #763]  INFO -- : New RightAws::Ec2 using shared connections mode
> I, [2011-04-20T03:31:10.681741 #763]  INFO -- : Opening new HTTPS connection to ec2.amazonaws.com:443
> warning: peer certificate won't be verified in this SSL session
> Waiting for volume to be available...
> .
> Volume is available.
> Created volume vol-d33fd5b8 in availability zone us-east-1d.
>
> ...
> ...
> # /usr/local/Rmpi/mpiutil -a xxx -s yyy -w 3 -n "my cluster" -t t1.micro -v vol-d33fd5b8
> warning: peer certificate won't be verified in this SSL session
> using device /dev/sdg...
> waiting for volume to be attached....
> .......Volume is attached.
> waiting for workers to start...
> .....................workers are up
> Cluster started.
> ...
> ...
> ...
>> library(Rmpi)
>>
>> mpi.spawn.Rslaves(nslaves = nsl)
> Warning: Permanently added 'worker002,10.215.117.28' (RSA) to the list of known hosts.
> Warning: Permanently added 'worker003,10.96.55.43' (RSA) to the list of known hosts.
> Warning: Permanently added 'worker001,10.206.198.16' (RSA) to the list of known hosts.
> ^Cmpirun: killing job...
>
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons on the nodes shown
> below. Additional manual cleanup may be required - please refer to
> the "orte-clean" tool for assistance.
> --------------------------------------------------------------------------
> worker001 - daemon did not report back when launched
> worker002 - daemon did not report back when launched
> worker003 - daemon did not report back when launched
>

Sometimes it takes a few moments for the workers to be ready. Did you
try the test script a few times?
If it doesn't respond within a minute, try ^C and then try the test
script again.

I just tried it and it worked, but I had to try a couple of times
before the workers were ready (didn't have to interrupt with ^C
though). Once they were ready, I could run the test script multiple
times.

Thanks for your interest in the AMI. Let us know if you need further help.
Dan

>
> Thanks very much!
> David Gibbs
> OHSU student
>
>        [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list