[R] Hierarchical clustering using own distance matrices

Newbie@R ayesha.2.jadoon at googlemail.com
Wed May 26 14:05:18 CEST 2010



Newbie at R wrote:
> 
> Hey Everyone!
> 
> I wanted to carry out Hierarchical clustering using distance matrices i
> have calculated ( instead of euclidean distance etc.)
> 
> I understand as.dist is the function for this, but the distances in the
> dendrogram i got by using the following script(1) were not the distances
> defined in my distance matrices. 
> 
> script:
> var<-read.table("the distance matrix i calculated", header=TRUE, sep=" ")
> var_HC<-hclust(as.dist(var),method="average")
> 
> 
> var_dendro<-as.dendrogram(var_HC)
> 
> plot(var_dendro,ylim=c(0,5), nodePar =list(lab.cex = 0.3), header=title("
> My Distance Matrix"))
> 
> 
> I did some research and found that the hclust function (from the hclust
> help page):
> 
> 
> "...Initially, each object is assigned to its own cluster and then the
> algorithm proceeds iteratively, at each stage joining the two most similar
> clusters, continuing until there is just a single cluster. At each stage
> distances between clusters are recomputed by the Lance–Williams
> dissimilarity update formula according to the particular clustering method
> being used. ..."
> 
> 
> I am wondering is there another function that doesnt do " At each stage
> distances between clusters are recomputed by the Lance–Williams
> dissimilarity update formula according to the particular clustering method
> being used.."???
> 
> 
> I hope my message was clear, any help would be greatly appreciated.
> 
> 
> Thanks!!
> 
> A.Jadoon
> 
> Kings College London
> 
> 
> 
If I understand your question correctly, you expected to find the
distances in your matrix in the dendrogram?

Well, hierarchical clustering needs some way of calculating distance
between clusters, and these distances are based on the distance
matrix, but do not equal them.

The choice "average" you used means that if clusters C1 and C2 are
joined, the distance of the joined cluster to another cluster C' is
the average distance of all elements of clusters C1 and C2 to all
elements of C'. Thus, the distances in the dendrogram are averages of
groups of distances in your matrix.

The Lance-Williams is a catch-all term and formula that, with certain
special values of the coefficients, reduces to the more intuitive
choices like "average", "complete" etc.

Peter Langfelder



Hey Peter!

Precisely what you understand- i had hoped to see the distances in my matrix
as the lengths in my dendrogram.
You have been an enormous help! Thank you!! (I find R documentation a little
hard to understand sometimes!-prob due to my tiny experience with
programming and clustering methods!)

Ayesha


-- 
View this message in context: http://r.789695.n4.nabble.com/Hierarchical-clustering-using-own-distance-matrices-tp2230724p2231501.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list