[BioC] RE: Advice on cluster hardware

Michael Benjamin msb1129 at bellsouth.net
Sun Dec 7 05:04:59 MET 2003


That sounds like a tremendously exciting project!  You seem to have done
much research, and your solution should be both scalable and reasonably

I have seen reports that R is "compatible" with openMosix, but I have
not found reports of real-world experience.  Does R use shared memory?
Does it spawn processes that migrate efficiently?  What kind of
performance enhancement do you anticipate with additional nodes?

Best wishes!
Michael Benjamin, MD
Emory University,
Winship Cancer Institute
Atlanta, GA USA

-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch
[mailto:bioconductor-bounces at stat.math.ethz.ch] On Behalf Of Ramon
Sent: Thursday, December 04, 2003 5:58 AM
To: Ross Boylan; bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] Advice on cluster hardware

Dear Ross,

I don't have any relevant experience, but we are in the process of also 
building a cluster, so I'll share some of my confusion with you. Our
will be an openMosix cluster, probably also with LVS, for applications
do not migrate well. The cluster will have about 30 nodes now, possibly
up to 50. 

Because we wanted to ensure painless use of applications we already use
develop and we are also very interested in using Debian (ease of 
administration, use, and upgrading), we decided to go for a 32 bit
(dual Xeon machines ---dual CPU machines, among other things, decrease 
somewhat the load of the network, because every pair of CPUs is already 
connected by the machine bus). We have considered HP, Dell, and IBM 
(including their blades). The 64 bit with AMD seemed a bit risky at this

moment. But, if you can consider things other than GNU/Linux, I've heard
clusters built with G5 processors can be a great idea; a lot of bung for
buck (and 64 bit, and I think much larger amounts of RAM per processes).

We were concerned with potential network problems, and asked about it on
openMosix list (see the openMosix-general list, the thread "openMosix
with 50 nodes: network issues", starting about 2 weeks ago). After those

answers and some additional research, it seems that a Gigabit solution
be enough for our needs with, for instance, a Cisco 4500 switch. The
seems to be that scaling can be poor, and if your network grows large
> 48 nodes) you might need to change switches, which becomes expensive,
But I understand this is not a likely problem in your case in the near 

Disk size and disk speed did not seemed critical for our intended uses;
will combine local disks of moderate size (about 60 GB) with a "master
of with about 400 GB in several disks. The oMFS seems to work well, and 
machines with local disks give us more flexibility, such as if we want
remove a few from the cluster and use them standalone for something

Once things are up and running (or trying to run), I will be glad to
more details of our experience.



On Wednesday 03 December 2003 22:50, Ross Boylan wrote:
> The group I am in is about to purchase a cluster.  If anyone on this
> list has any advice on what type of hardware (or software) would be
> best, I'd appreciate it.  I didn't find any discussion of this in the
> archives, but I thought some people on this list might have relevant
> experience.
> We will have two broad types of uses: simulation studies for
> epidemiology (with people or cases as the units) and genetic and
> studies, whose details I don't know but you all probably do.  The
> simulation studies are likely to make heavy use of R.  I suspect that
> the twp uses have much different characteristics, e.g., in terms of
> size of the datasets to manipulate and the best tradeoffs outlined
> below.
> Other uses are possible.
> Among other issues we are wondering about:
> *Tradeoffs between CPU speed, memory, internode communication speed,
> disk size, and disk speed.
> As a first cut, I expect the simulations suggest emphasizing processor
> power and ensuring adequate memory.  On the other hand, the fact that
> it's easy to upgrade CPUs suggests putting more money into the network
> supporting the CPUs.  And I suspect the genomics emphasizes more the
> ability to move large amounts of data around quickly (across network
> to disk).
> *Appropriate disk architecture (e.g., local disks vs shared netword
> disks or SANS).
> 32 vs 64 bit; Intel vs AMD.
> We assume it will be some kind of Linux OS (we like Debian, but
> tend to supply RH and Debian lacks support for 64 bit AMD in any
> official way, unlike Suse or RH).  If there's a good reason, we could
> use something else.
> Our budget is relatively modest, enough perhaps for 10-15
> nodes.  We hope to expand later.
> As a side issue, more a personal curiosity, why do clusters all seem
> be built on dual-processor nodes?  Why not more CPU's per node?
> Thanks for any help you can offer.

Ramón Díaz-Uriarte
Bioinformatics Unit
Centro Nacional de Investigaciones Oncológicas (CNIO)
(Spanish National Cancer Center)
Melchor Fernández Almagro, 3
28029 Madrid (Spain)
Fax: +-34-91-224-6972
Phone: +-34-91-224-6900

PGP KeyID: 0xE89B3462

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch

More information about the Bioconductor mailing list