[BioC] Advice on cluster hardware

Ross Boylan ross at biostat.ucsf.edu
Wed Dec 3 22:50:14 MET 2003

The group I am in is about to purchase a cluster.  If anyone on this
list has any advice on what type of hardware (or software) would be
best, I'd appreciate it.  I didn't find any discussion of this in the
archives, but I thought some people on this list might have relevant

We will have two broad types of uses: simulation studies for
epidemiology (with people or cases as the units) and genetic and protein
studies, whose details I don't know but you all probably do.  The
simulation studies are likely to make heavy use of R.  I suspect that
the twp uses have much different characteristics, e.g., in terms of the
size of the datasets to manipulate and the best tradeoffs outlined

Other uses are possible.

Among other issues we are wondering about:
*Tradeoffs between CPU speed, memory, internode communication speed,
disk size, and disk speed.

As a first cut, I expect the simulations suggest emphasizing processor
power and ensuring adequate memory.  On the other hand, the fact that
it's easy to upgrade CPUs suggests putting more money into the network
supporting the CPUs.  And I suspect the genomics emphasizes more the
ability to move large amounts of data around quickly (across network and
to disk).  

*Appropriate disk architecture (e.g., local disks vs shared netword
disks or SANS).  

32 vs 64 bit; Intel vs AMD.

We assume it will be some kind of Linux OS (we like Debian, but vendors
tend to supply RH and Debian lacks support for 64 bit AMD in any
official way, unlike Suse or RH).  If there's a good reason, we could
use something else.

Our budget is relatively modest, enough perhaps for 10-15 dual-processor
nodes.  We hope to expand later.

As a side issue, more a personal curiosity, why do clusters all seem to
be built on dual-processor nodes?  Why not more CPU's per node?

Thanks for any help you can offer.
Ross Boylan                                      wk:  (415) 502-4031
530 Parnassus Avenue (Library) rm 115-4          ross at biostat.ucsf.edu
Dept of Epidemiology and Biostatistics           fax: (415) 476-9856
University of California, San Francisco
San Francisco, CA 94143-0840                     hm:  (415) 550-1062

