[BioC] Advice on cluster hardware

Anders Sjögren anders.sjogren at math.chalmers.se
Thu Dec 11 10:34:13 MET 2003

Dear All,

regarding Apple G5 clustering, there is a rather much spoken of cluster 
of 1100 dual 64-bit G5:s at Virginia Tech, which can studied to some 
depth (including design documents etc, if I recall correctly) at their 
homepage http://computing.vt.edu/research_computing/terascale/ . Should 
be interesting to take a look at. The G5:s are supposed to run several 
distros of Linux, but I don't know the details. I think Virginia Tech 
runs OS X 10.3 on them.

Best regards

Anders Sjögren

PhD Student
Dept. of Mathematical Statistics
Chalmers University of Technology
Göteborg, Sweden


On Dec 4, 2003, at 11:58 AM, Ramon Diaz-Uriarte wrote:

> Dear Ross,
> I don't have any relevant experience, but we are in the process of also
> building a cluster, so I'll share some of my confusion with you. Our 
> cluster
> will be an openMosix cluster, probably also with LVS, for applications 
> that
> do not migrate well. The cluster will have about 30 nodes now, 
> possibly going
> up to 50.
> Because we wanted to ensure painless use of applications we already 
> use and/or
> develop and we are also very interested in using Debian (ease of
> administration, use, and upgrading), we decided to go for a 32 bit 
> platform
> (dual Xeon machines ---dual CPU machines, among other things, decrease
> somewhat the load of the network, because every pair of CPUs is already
> connected by the machine bus). We have considered HP, Dell, and IBM
> (including their blades). The 64 bit with AMD seemed a bit risky at 
> this
> moment. But, if you can consider things other than GNU/Linux, I've 
> heard that
> clusters built with G5 processors can be a great idea; a lot of bung 
> for the
> buck (and 64 bit, and I think much larger amounts of RAM per 
> processes).
> We were concerned with potential network problems, and asked about it 
> on the
> openMosix list (see the openMosix-general list, the thread "openMosix 
> cluster
> with 50 nodes: network issues", starting about 2 weeks ago). After 
> those
> answers and some additional research, it seems that a Gigabit solution 
> will
> be enough for our needs with, for instance, a Cisco 4500 switch. The 
> problem
> seems to be that scaling can be poor, and if your network grows large 
> (e.g.,
>> 48 nodes) you might need to change switches, which becomes expensive, 
>> etc.
> But I understand this is not a likely problem in your case in the near
> future.
> Disk size and disk speed did not seemed critical for our intended 
> uses; we
> will combine local disks of moderate size (about 60 GB) with a "master 
> node"
> of with about 400 GB in several disks. The oMFS seems to work well, and
> machines with local disks give us more flexibility, such as if we want 
> to
> remove a few from the cluster and use them standalone for something 
> else.
> Once things are up and running (or trying to run), I will be glad to 
> provide
> more details of our experience.
> Best,
> R.
> On Wednesday 03 December 2003 22:50, Ross Boylan wrote:
>> The group I am in is about to purchase a cluster.  If anyone on this
>> list has any advice on what type of hardware (or software) would be
>> best, I'd appreciate it.  I didn't find any discussion of this in the
>> archives, but I thought some people on this list might have relevant
>> experience.
>> We will have two broad types of uses: simulation studies for
>> epidemiology (with people or cases as the units) and genetic and 
>> protein
>> studies, whose details I don't know but you all probably do.  The
>> simulation studies are likely to make heavy use of R.  I suspect that
>> the twp uses have much different characteristics, e.g., in terms of 
>> the
>> size of the datasets to manipulate and the best tradeoffs outlined
>> below.
>> Other uses are possible.
>> Among other issues we are wondering about:
>> *Tradeoffs between CPU speed, memory, internode communication speed,
>> disk size, and disk speed.
>> As a first cut, I expect the simulations suggest emphasizing processor
>> power and ensuring adequate memory.  On the other hand, the fact that
>> it's easy to upgrade CPUs suggests putting more money into the network
>> supporting the CPUs.  And I suspect the genomics emphasizes more the
>> ability to move large amounts of data around quickly (across network 
>> and
>> to disk).
>> *Appropriate disk architecture (e.g., local disks vs shared netword
>> disks or SANS).
>> 32 vs 64 bit; Intel vs AMD.
>> We assume it will be some kind of Linux OS (we like Debian, but 
>> vendors
>> tend to supply RH and Debian lacks support for 64 bit AMD in any
>> official way, unlike Suse or RH).  If there's a good reason, we could
>> use something else.
>> Our budget is relatively modest, enough perhaps for 10-15 
>> dual-processor
>> nodes.  We hope to expand later.
>> As a side issue, more a personal curiosity, why do clusters all seem 
>> to
>> be built on dual-processor nodes?  Why not more CPU's per node?
>> Thanks for any help you can offer.
> -- 
> Ramón Díaz-Uriarte
> Bioinformatics Unit
> Centro Nacional de Investigaciones Oncológicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fernández Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
> http://bioinfo.cnio.es/~rdiaz
> PGP KeyID: 0xE89B3462
> (http://bioinfo.cnio.es/~rdiaz/0xE89B3462.asc)
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/bioconductor

More information about the Bioconductor mailing list