[R] Clustering quality measure

Martin Maechler maechler at stat.math.ethz.ch
Wed Jun 18 10:34:50 CEST 2003


>>>>> "Jonck" == Jonck van der Kogel <jonck at vanderkogel.net>
>>>>>     on Tue, 17 Jun 2003 17:23:33 +0200 writes:

    Jonck> Hi all, I am running a series of experiments where
    Jonck> after manipulating my data I run several clustering
    Jonck> algorithms (agnes, diana and a clustering method of
    Jonck> my own) on the data. I wanted to determine which
    Jonck> clustering method did the best job, so therefore I
    Jonck> had defined my own quality measure using two
    Jonck> criteria: compactness of the data within the clusters
    Jonck> themselves and the amount of seperation between the
    Jonck> clusters. Anyway, my quality measure does not work,
    Jonck> since according to my quality measure the quality
    Jonck> gets increasingly better as more clusters are formed
    Jonck> untill every data instance is a cluster by itself.
    Jonck> Therefore I was wondering if any of you are aware of
    Jonck> any libraries or functions within R that determine
    Jonck> quality measures of clusterings, I am very much
    Jonck> intrigued by the definition of quality measures that
    Jonck> do work.  Thanks very much, Jonck

Well,  "do work" is said much.  

But there's silhouette() in the `cluster' package {where agnes()
and diana() reside}. You can plot silhouettes of almost any
clustering {i.e. grouping} as a diagnostic, and the "Average
Silhouette Width" has been proposed as "goodness of fit" measure
for clusters, and even to determine how many clusters you should
choose.

One of its several drawbacks is that it's not defined for the
"only 1 cluster" situation, i.e., you cannot use it to compare
one vs two clusters.

--> ?silhouette

and look and try the "Examples".

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><




More information about the R-help mailing list