[BioC]: Clustering with Diana

Chris Wilkinson Christopher.Wilkinson at adelaide.edu.au
Fri Dec 17 00:54:59 CET 2004

> could someone please help with Diana clustering and visualisation.
> I would like to do 1-way (genes only) and 2-way (genes and samples)
> clustering and visualise as a heatmap or in Treeview software.
> Anthony
> --

I've never used Treeview but I have used heatmap and diana. The way I've
used diana is to first save the diana object, convert to dendrogram, and
define clusters by cutting it a certain height. I've used the diana
algorithm both with and without the dissimilarity matrix. I've copied some
code I have and modified the names of objects to hopefully be a bit clearer.

## on raw data (matrix of Mvalues, rows = genes, col=arrays)
Mvalues <- matrix(0,nrow=100,ncol=9)
rownames(Mvalues) <- 300:400
colnames(Mvalues) <- ("a","b","c","d","e","f","g","h","i")
for (i in 1:3) Mvalues[,i] <- rnorm(100)
for (i in 4:6) Mvalues[,i] <- rnorm(100,mean=2,sd=0.5)
for (i in 7:9) Mvalues[,i] <- rnorm(100,mean=-1,sd=0.7)

dianaGenes <- diana(Mvalues)
## or using a precomputed dissilarity matrix:
## dianaGenes <- diana(dissMatrix,diss=TRUE,keep.diss=FALSE)

dianaDend <- as.dendrogram(as.hclust(dianaGenes))
dianaDendOrder <- order.dendrogram(dianaDend)

## My rownames is index.name. I reorder it based on the new order
clusteredGeneNames <- rownames(Mvalues)[dianaDendOrder]

## To select the colours use
low <- col2rgb("green")/255
high <- col2rgb("red")/255
heatmapCol <- rgb( seq(low[1],high[1],len=123), seq(low[2],high[2],len=123),
                  seq(low[3],high[3],len=123) )
## personally I don't much like the red/green system, and prefer heat.colors
heatmapCol <- heat.colors(123)

## If you are just clustering on genes you can colour the arrays
## eg say you had 3 groups of 3
colColours <- c(rep("green",3),rep("red",3),rep("blue",3))

##you can also define clusters by cutting the dendrogram and colouring
dianaClusters.h2 <- cut(dianaDend,h=2)
nClusters <- length(dianaClusters.h2$lower)
dianaClusters <- numeric(length=dim(Mvalues)[1])
for (i in 1:nClusters)
   dianaClusters[order.dendrogram(dianaClusters.h2$lower[[i]])] <- i

## now colour the rows based on clusters.
## I like distinct colours between clusters
rowColChoices <- character(nClusters)
nClusters.2 <- ceiling(nClusters/2)
nClusters.2.min <- min(nClusters.2,floor(nClusters/2))
rowColChoices[1:nClusters.2*2-1] <- rainbow(nClusters.2,start=0,end=2/6)
rowColChoices[1:nClusters.2.min*2] <-
rowCols <- character(dim(Mvalues)[1])
for (i in 1:length(rowCols)) rowCols[i] <- rowColChoices[dianaClusters[i]]

## or randomly assign colours
rowColChoices <- rainbow(nClusters)[sample(nClusters,nClusters)]
rowCols <- character(length=dim(MValues)[1])
for (i in 1:length(rowCols))
   rowCols[i] <- rowColChoices[dianaClusters[i]]

## To cluster just on genes:
heatmap(Mvalues, Rowv=dianaDend, Colv=NA, scale="row",

to cluster on genes and arrays I think just replace Colv
with a dendrogram object based on clustering over cols:

dianaArrays <- diana(t(Mvalues))
dianaDendArrays <- as.dendrogram(as.hclust(dianaArrays))
# call heatmap with Colv=dianaDendArrays and drop ColSideColours
heatmap(Mvalues, Rowv=dianaDend, Colv=dianaDendArrays, scale="row",


Dr Chris Wilkinson

Senior Research Officer (Bioinformatics) | ARC Research Associate
Child Health Research Institute (CHRI)   | Microarray Analysis Group
7th floor, Clarence Rieger Building      | Room 121
Women's and Children's Hospital          | School of Mathematical Sciences
72 King William Rd, North Adelaide, 5006 | The University of Adelaide, 5005

Math's Office (Room 121)        Ph: 8303 3714
CHRI   Office (CR2 52A)         Ph: 8161 6363

Christopher.Wilkinson at adelaide.edu.au


More information about the Bioconductor mailing list