[BioC] Advice on cluster hardware
anders.sjogren at math.chalmers.se
Thu Dec 11 10:34:13 MET 2003
regarding Apple G5 clustering, there is a rather much spoken of cluster
of 1100 dual 64-bit G5:s at Virginia Tech, which can studied to some
depth (including design documents etc, if I recall correctly) at their
homepage http://computing.vt.edu/research_computing/terascale/ . Should
be interesting to take a look at. The G5:s are supposed to run several
distros of Linux, but I don't know the details. I think Virginia Tech
runs OS X 10.3 on them.
Dept. of Mathematical Statistics
Chalmers University of Technology
On Dec 4, 2003, at 11:58 AM, Ramon Diaz-Uriarte wrote:
> Dear Ross,
> I don't have any relevant experience, but we are in the process of also
> building a cluster, so I'll share some of my confusion with you. Our
> will be an openMosix cluster, probably also with LVS, for applications
> do not migrate well. The cluster will have about 30 nodes now,
> possibly going
> up to 50.
> Because we wanted to ensure painless use of applications we already
> use and/or
> develop and we are also very interested in using Debian (ease of
> administration, use, and upgrading), we decided to go for a 32 bit
> (dual Xeon machines ---dual CPU machines, among other things, decrease
> somewhat the load of the network, because every pair of CPUs is already
> connected by the machine bus). We have considered HP, Dell, and IBM
> (including their blades). The 64 bit with AMD seemed a bit risky at
> moment. But, if you can consider things other than GNU/Linux, I've
> heard that
> clusters built with G5 processors can be a great idea; a lot of bung
> for the
> buck (and 64 bit, and I think much larger amounts of RAM per
> We were concerned with potential network problems, and asked about it
> on the
> openMosix list (see the openMosix-general list, the thread "openMosix
> with 50 nodes: network issues", starting about 2 weeks ago). After
> answers and some additional research, it seems that a Gigabit solution
> be enough for our needs with, for instance, a Cisco 4500 switch. The
> seems to be that scaling can be poor, and if your network grows large
>> 48 nodes) you might need to change switches, which becomes expensive,
> But I understand this is not a likely problem in your case in the near
> Disk size and disk speed did not seemed critical for our intended
> uses; we
> will combine local disks of moderate size (about 60 GB) with a "master
> of with about 400 GB in several disks. The oMFS seems to work well, and
> machines with local disks give us more flexibility, such as if we want
> remove a few from the cluster and use them standalone for something
> Once things are up and running (or trying to run), I will be glad to
> more details of our experience.
> On Wednesday 03 December 2003 22:50, Ross Boylan wrote:
>> The group I am in is about to purchase a cluster. If anyone on this
>> list has any advice on what type of hardware (or software) would be
>> best, I'd appreciate it. I didn't find any discussion of this in the
>> archives, but I thought some people on this list might have relevant
>> We will have two broad types of uses: simulation studies for
>> epidemiology (with people or cases as the units) and genetic and
>> studies, whose details I don't know but you all probably do. The
>> simulation studies are likely to make heavy use of R. I suspect that
>> the twp uses have much different characteristics, e.g., in terms of
>> size of the datasets to manipulate and the best tradeoffs outlined
>> Other uses are possible.
>> Among other issues we are wondering about:
>> *Tradeoffs between CPU speed, memory, internode communication speed,
>> disk size, and disk speed.
>> As a first cut, I expect the simulations suggest emphasizing processor
>> power and ensuring adequate memory. On the other hand, the fact that
>> it's easy to upgrade CPUs suggests putting more money into the network
>> supporting the CPUs. And I suspect the genomics emphasizes more the
>> ability to move large amounts of data around quickly (across network
>> to disk).
>> *Appropriate disk architecture (e.g., local disks vs shared netword
>> disks or SANS).
>> 32 vs 64 bit; Intel vs AMD.
>> We assume it will be some kind of Linux OS (we like Debian, but
>> tend to supply RH and Debian lacks support for 64 bit AMD in any
>> official way, unlike Suse or RH). If there's a good reason, we could
>> use something else.
>> Our budget is relatively modest, enough perhaps for 10-15
>> nodes. We hope to expand later.
>> As a side issue, more a personal curiosity, why do clusters all seem
>> be built on dual-processor nodes? Why not more CPU's per node?
>> Thanks for any help you can offer.
> Ramón Díaz-Uriarte
> Bioinformatics Unit
> Centro Nacional de Investigaciones Oncológicas (CNIO)
> (Spanish National Cancer Center)
> Melchor Fernández Almagro, 3
> 28029 Madrid (Spain)
> Fax: +-34-91-224-6972
> Phone: +-34-91-224-6900
> PGP KeyID: 0xE89B3462
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
More information about the Bioconductor