[BioC] Clustering in R

michael watson (IAH-C) michael.watson at bbsrc.ac.uk
Thu Jun 17 10:16:59 CEST 2004

OK, admittedly it is not incredibly simple, but it is not *that*

If you are familiar with R, it should take you an hour or two;  if
unfamiliar, perhaps a day or two.

The commands you want (and need to read the help on) are:


With intelligent use of hclust -> cutree -> subsetting -> hclust (in
that order) you will be able to drill down into your dendrogram and
create sub-trees - until you get to the level where you can see your
gene names.

An important message to take home here is that if you have 14000 genes
and therefore 14000 labels, it's going to be difficult to display your
tree in ANY software, including the expensive commercial products.

Let me know how you get on


-----Original Message-----
From: wmak at brandeisedu [mailto:wmak at brandeis.edu] 
Sent: 16 June 2004 21:26
To: bioconductor at stat.math.ethz.ch
Subject: [BioC] Clustering in R

Dear list members,

I'm an undergrad and I work in a lab at Brandeis.  I am trying to
cluster around 14,000 genes across 6 microarray experiments.  Two of
these experiments are replicates.  I have decided to use R since it
seems to be the most complete and flexible software package for
normalization and clustering of microarray data.

The problem is that I am new to clustering and to R.  Just to mention of
a few of the problems I'm having: the dendrogram that is drawn by R from
the agnes object is far too dense to see any of the gene names; kmeans
won't work, returning an error saying that my data has NAs in it (there
weren't any missing values in the original table though); I'd like to be
able to see a heatmap or a cumulative plot of expression profiles for
genes that are clustered together or are on the same branch of the

I know that these questions are probably very simple, but I can't seem
to find the answer to them online or in the documentation.  If anyone
can answer these questions or direct me toward resources that deal with
clustering in R or BioConductor, a basic tutorial that takes a practical
approach to it, I would really appreciate it.  Any other reading
material that isn't too heavy on statistics that deals with clustering
for that matter, would be very helpful.

Thank you in advance,

Wayne Mak

Bioconductor mailing list
Bioconductor at stat.math.ethz.ch

More information about the Bioconductor mailing list