[R] Advice on exploration of sub-clusters in hierarchical dendrogram

kosmo7 dnicolgr at hotmail.com
Fri Feb 24 15:50:50 CET 2012


Ok, I was able to work it out finally.
As I have been aided myself numerous times from posted questions by other
users who have reached in the end a solution to their problem, I will put
the code that worked for me for future googlers - it is certainly not
optimal but it works:

# Initial clustering
df=read.table('mydata.txt', head=T, row.names=1) #read file with distance
matrix
d=as.dist(df) #format table as distance matrix
z<-hclust(d,method="complete", members=NULL)
x<-as.dendrogram(z)
plot(x, xlab="mydata complete-LINKAGE", ylim=c(0,4)) #visualization of the
dendrogram
clusters<-cutree(z, h=1.6) #obtain clusters at cutoff height=1.6
ord<-cmdscale(d, k=2) #Multidimensional scaling of the data down to 2
dimensions
clusplot(ord,clusters, color=TRUE, shade=TRUE,labels=4, lines=0)
#visualization of the clusters in 2D map 

# Local sub-clustering (actually re-clustering on a specific tree
node/cluster)

h<-as.matrix(d)  # transform the distance matrix to a simple matrix. We
should ideally  work with the initial data table but  it sometimes contains
an "X" letter preceding labels and there is a risk labels aren't recognized
by comparison to name vectors. Distance matrices don't contain the preceding
"X" so I transformed it back to a simple matrix  (this step might not be
required, depending on your initial data table format).

clid<-c(1)  # Just a column containing the number of the clusters of the
initial clustering that you want to pick - separate with commas if more than
one clusters,. Here we only want cluster 1.
ysub<-h[names(clusters[clusters%in%clid]),]  #Remove all rows from the h
table that do not begin by the label of a member of cluster 1
ysub<-t(ysub)[names(clusters[clusters%in%clid]),]  #We want a rectangular
table to be used as distance matrix later on, so we transpose the previous
table ysub and remove again the unneeded rows.
hrsub<-hclust(as.dist(ysub),method="average") #Perform your preferred
hierarchical method on just the initial clusters selected with clid 
plot(hrsub)
ord2<-cmdscale(ysub, k=2) 
plot(ord2) # Now we can visually "zoom" on the data configuration of just
the selected cluster by 2d MDS
aa<-silhouette(cutree(hrsub,h=1.2),as.dist(ysub)) #We can perform silhouette
analysis localy on the selected cluster (by clid)
plot(aa)
clusplot(ord2,cutree(hrsub,h=1.2), color=TRUE, shade=TRUE,labels=4, lines=0)
# clusterplot of the subclusters


Thanks for reading - take care all.

PS. If anyone can write all these things in a more efficient way, please
feel free to add a comment.


--
View this message in context: http://r.789695.n4.nabble.com/Advice-on-exploration-of-sub-clusters-in-hierarchical-dendrogram-tp4414277p4417419.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list