[R] kmeans Clustering

Mark Hempelmann neo27 at t-online.de
Thu Mar 23 21:25:02 CET 2006


Dear WizaRds,

	My goal is to program the VS-KM algorithm by Brusco and Cradit 01 and I have 
come to a complete stop in my efforts. Maybe anybody is willing to follow my 
thoughts and offer some help.
	In a first step, I want to use a single variable for the partitioning process. 
As the center-matrix I use the objects that belong to the cluster I found with 
the hierarchial Ward algorithm. Then, I have to take all possible variable pairs 
and apply kmeans again, which is quite confusing to me. Here is
what I do:

##	0. data
mat <- matrix( c(6,7,8,2,3,4,12,14,14, 14,15,13,3,1,2,3,4,2,
15,3,10,5,11,7,13,6,1, 15,4,10,6,12,8,12,7,1), ncol=9, byrow=T )
rownames(mat) <- paste("v", 1:4, sep="" )
tmat <- t(mat)

##	1. Provide clusters via Ward:
ward	<- hclust(d=dist(tmat), method = "ward", members=NULL)

##	2. Compute cluster centers and create center-matrix for kmeans:
groups	<- cutree(ward, k = 3, h = NULL)

centroids	<- vector(mode="numeric", length=3)
obj		<- vector(mode="list", length=3)

for (i in 1:3){
	where <- which(groups==i) # which object belongs to which group?
	centroids[i] <- mean( tmat[ where, ] )
	obj[[i]] <- tmat[where,]
}
P	<- vector(mode="numeric", dim(mat)[2] )
pj	<- vector(mode="list", length=dim(mat)[1])

for (i in 1:dim(mat)[1]){
	pj[[i]] <- kmeans( tmat[,i], centers=centroids, iter.max=10, algorithm="MacQueen")
	P <- rbind(P, pj[[i]]$cluster)
}
P	<- P[-1,]

##	gives a matrix of partitions using each single variable
##	(I'm sure, P can be programmed much easier)

##	3. kmeans using all possible pairs of variables, here just e.g. variables 1 
and 3:
wjk	<- kmeans(tmat[,c(1,3)], centers=centroids, iter.max=10, algorithm="MacQueen")

###
	which, of course, gives an error message since "centroids" is not a matrix of 
the cluster centers. How on earth do I correctly construct a matrix of centers 
corresponding to the pairwise variables? Is it always the same matrix no matter 
which pair of variables I choose?
	I apologize for my lack of clustering knowledge and expertise - any help is 
welcome. Thank you very much.

Many greetings
mark




More information about the R-help mailing list