[R] Elbow criterion plots for determining k in hierarchical clustering

Guera jeppesen_becky at hotmail.com
Fri Mar 14 18:08:26 CET 2008


re:" ... (I) would like to create a plot to examine for the classic elbow
criterion to use in determining the best number of clusters.  Ideally I'd
like to plot percent variance explained (y axis) against number of clusters
(x axis)....  Is there a way to do this in R...?"  

I found a way to produce an elbow criterion plot, using height as a measure
of dissimilarity.  I determined the difference in height between the two
most similar clusters at k=x from the dendrogram and plotted this (y)
against k (x).   It does produce an elbow in the plot which narrows it down
considerably, but it is still subject to interpretation.  

I chose k based on:
1. the location of the elbow on the plot
2. cluster size (e.g. if I had it narrowed down to 4 or 5, and making the
fifth produced clusters of say 1 or 2 that weren't there at k=4, I'd use 4)
3. the height of the "tallest" cluster at k=x
4. the eigenvalues from a PCA at k=x. 

I thought I should reply to my own post since I noticed that the other
postings on similar topics also we're replied to, and thought  this could
possible help others down the road.

-----
Rebecca Jeppesen, MSc Candidate
Acadia University
Wolfville, N.S.
Canada
-- 
View this message in context: http://www.nabble.com/Elbow-criterion-plots-for-determining-k-in-hierarchical-clustering-tp15921695p16048615.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list