[R] clustering problem

Uwe Ligges ligges at statistik.tu-dortmund.de
Mon Feb 25 11:48:26 CET 2008



Karin Lagesen wrote:
> First I just want to say thanks for all the help I've had from the
> list so far..)
> 
> I now have what I think is a clustering problem. I have lots of
> objects which I have measured a dissimilarity between. Now, this list
> only has one entry per pair, so it is not symmetrical.
> 
> Example input:
> 
> NameA   NameB   Dist
> 189_1C2 189_1C1 0
> 189_1C3 189_1C1 0.017
> 189_1C3 189_1C2 0.017
> 189_1C4 189_1C1 0
> 189_1C4 189_1C2 0
> 189_1C4 189_1C3 0.017
> 189_1C5 189_1C1 0.05
> 189_1C5 189_1C2 0.05
> 189_1C5 189_1C3 0.067
> 189_1C5 189_1C4 0.05
> 189_1C6 189_1C1 0.05
> 189_1C6 189_1C2 0.05
> 189_1C6 189_1C3 0.067
> 189_1C6 189_1C4 0.05
> 189_1C6 189_1C5 0
> 
> 
> The distance measure is 0 if identical, and then increases with
> increasing dissimilarity up till 1.


This is a bit difficult, because you already have the distances, it is 
easier to calculate them with the dist() function from some data.frame, 
because that one produces the required distance matrix.

Anyway, what you can do (and I do not think it is the best way) for your 
data.frame X is:


## remember the names:
nm <- sort(levels(unlist(X[,1:2])))

## reshpae to matrix like form:
X <- reshape(X, direction="wide", idvar="NameA", timevar="NameB")

## add column / row for 1st and last object and remove names column:
X <- rbind(NA, cbind(X, NA))[,-1]

## assign names again:
names(X) <- row.names(X) <- nm

## transform to some distance matrix:
X <- as.dist(X)

## apply clustering with average linkage:
hc <- hclust(X, method="average")
plot(hc)


Uwe Ligges



> What I would like to get from these data is a hierarchical clustering
> graph. In this example I would then group
> 
> 189_1C2 189_1C1 189_1C4,
> 
> 189_1C6 189_1C5,
> 
> and 189_1C3 off with itself.
> 
> The distances between the groups should be the mean distances between
> the objects within each group (I think).
> 
> I have looked at hclust and it seems like it should be able to do what
> I want. However, I am unsure of how to use it to get what I am looking
> for.
> 
> Thankyou in advance for your help!
> 
> Karin



More information about the R-help mailing list