[BioC] RE: Advice on cluster hardware

Michael Benjamin msb1129 at bellsouth.net
Sun Dec 7 05:04:59 MET 2003


Ramon:

That sounds like a tremendously exciting project!  You seem to have done
much research, and your solution should be both scalable and reasonably
priced.

I have seen reports that R is "compatible" with openMosix, but I have
not found reports of real-world experience.  Does R use shared memory?
Does it spawn processes that migrate efficiently?  What kind of
performance enhancement do you anticipate with additional nodes?

Best wishes!
Michael Benjamin, MD
Emory University,
Winship Cancer Institute
Atlanta, GA USA

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Ramon
Diaz-Uriarte
Sent: Thursday, December 04, 2003 5:58 AM
To: Ross Boylan; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Advice on cluster hardware

Dear Ross,

I don't have any relevant experience, but we are in the process of also 
building a cluster, so I'll share some of my confusion with you. Our
cluster 
will be an openMosix cluster, probably also with LVS, for applications
that 
do not migrate well. The cluster will have about 30 nodes now, possibly
going 
up to 50. 

Because we wanted to ensure painless use of applications we already use
and/or 
develop and we are also very interested in using Debian (ease of 
administration, use, and upgrading), we decided to go for a 32 bit
platform 
(dual Xeon machines ---dual CPU machines, among other things, decrease 
somewhat the load of the network, because every pair of CPUs is already 
connected by the machine bus). We have considered HP, Dell, and IBM 
(including their blades). The 64 bit with AMD seemed a bit risky at this

moment. But, if you can consider things other than GNU/Linux, I've heard
that 
clusters built with G5 processors can be a great idea; a lot of bung for
the 
buck (and 64 bit, and I think much larger amounts of RAM per processes).

We were concerned with potential network problems, and asked about it on
the 
openMosix list (see the openMosix-general list, the thread "openMosix
cluster 
with 50 nodes: network issues", starting about 2 weeks ago). After those

answers and some additional research, it seems that a Gigabit solution
will 
be enough for our needs with, for instance, a Cisco 4500 switch. The
problem 
seems to be that scaling can be poor, and if your network grows large
(e.g., 
> 48 nodes) you might need to change switches, which becomes expensive,
etc. 
But I understand this is not a likely problem in your case in the near 
future.

Disk size and disk speed did not seemed critical for our intended uses;
we 
will combine local disks of moderate size (about 60 GB) with a "master
node" 
of with about 400 GB in several disks. The oMFS seems to work well, and 
machines with local disks give us more flexibility, such as if we want
to 
remove a few from the cluster and use them standalone for something
else.

Once things are up and running (or trying to run), I will be glad to
provide 
more details of our experience.

Best,

R.


On Wednesday 03 December 2003 22:50, Ross Boylan wrote:
> The group I am in is about to purchase a cluster.  If anyone on this
> list has any advice on what type of hardware (or software) would be
> best, I'd appreciate it.  I didn't find any discussion of this in the
> archives, but I thought some people on this list might have relevant
> experience.
>
> We will have two broad types of uses: simulation studies for
> epidemiology (with people or cases as the units) and genetic and
protein
> studies, whose details I don't know but you all probably do.  The
> simulation studies are likely to make heavy use of R.  I suspect that
> the twp uses have much different characteristics, e.g., in terms of
the
> size of the datasets to manipulate and the best tradeoffs outlined
> below.
>
> Other uses are possible.
>
> Among other issues we are wondering about:
> *Tradeoffs between CPU speed, memory, internode communication speed,
> disk size, and disk speed.
>
> As a first cut, I expect the simulations suggest emphasizing processor
> power and ensuring adequate memory.  On the other hand, the fact that
> it's easy to upgrade CPUs suggests putting more money into the network
> supporting the CPUs.  And I suspect the genomics emphasizes more the
> ability to move large amounts of data around quickly (across network
and
> to disk).
>
> *Appropriate disk architecture (e.g., local disks vs shared netword
> disks or SANS).
>
> 32 vs 64 bit; Intel vs AMD.
>
> We assume it will be some kind of Linux OS (we like Debian, but
vendors
> tend to supply RH and Debian lacks support for 64 bit AMD in any
> official way, unlike Suse or RH).  If there's a good reason, we could
> use something else.
>
> Our budget is relatively modest, enough perhaps for 10-15
dual-processor
> nodes.  We hope to expand later.
>
> As a side issue, more a personal curiosity, why do clusters all seem
to
> be built on dual-processor nodes?  Why not more CPU's per node?
>
> Thanks for any help you can offer.

-- 
Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

http://bioinfo.cnio.es/~rdiaz
PGP KeyID: 0xE89B3462
(http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)

_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor



More information about the Bioconductor mailing list