[R] Deterministic initialization for k-means

David Carlson dcarlson at tamu.edu
Sun Sep 15 22:50:37 CEST 2013


I am not aware of an implementation of the algorithm you
describe. If you are not locked to that particular approach,
function diana() in package cluster performs polythetic,
hierarchical partitioning which could be used to get your
starting cluster centers. 

Scanning over a paper by Su and Dy (you didn't give us a full
citation, this one was published in Intelligent Data Analysis
2007 (11): 319-338), they conclude:

"In case one cannot afford several random start runs, our
deterministic initialization methods provide reasonable
alternatives."

Unless your data is enormous, you can surely afford multiple
random starts. Just use the nstart= argument in the kmeans()
function. It will run that many kmeans analyses and pick the one
that produces the minimum within sum of squares.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Safiye Celik
Sent: Thursday, September 12, 2013 7:37 PM
To: r-help at r-project.org
Subject: [R] Deterministic initialization for k-means

Hi,

I want to cluster my data points into K clusters using k-means
algorithm,
and I want to use a deterministic (non-random) initialization
scheme which
is also a "good" start. I found a paper by Ting Su and Jennifer
Dy named "A
Deterministic Method for Initializing K-means Clustering" and I
wonder if
there is a way in R to partition the points in the way that is
described in
this paper.

"Starting from an initial cluster that contains the entire data
set, the
method iteratively selects the cluster with the greatest SSE and
divides it
into two subclusters using a hyperplane that passes through the
cluster
centroid and is orthogonal to the principal eigenvector of the
cluster
covariance matrix. This procedure is repeated until K clusters
are
obtained."

If I get the final K centers from this partitioning, then I can
give those
centers to R's kmeans algorithm to let it converge.

Is there a built-in R function to get such an initial
partitioning?

Thanks!

-- 
-safiye

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.



More information about the R-help mailing list