[R] Information criteria for kmeans

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Dec 5 12:24:38 CET 2007


This is not primarily an R question: if you tell us how you want to define 
it, we may be able to help you compute it.  I presume you are talking 
about Schwarz (1978), which is not billed as an 'information criterion'.

AFAIK, all Gideon Schwarz did was to define a criterion for linear 
regression.  People have applied it to other situations with a vector 
space of parameters.  However in many clustering methods (including 
kmeans, and as for example in classification trees) there is also a 
combinatorial part of the fit: you optimize over both the cluster centres 
and the allocation of units to clusters.  It does not come close to the 
Schwarz framework.

Nor does clustering fit into Akaike (1973, 1974)'s information framework.

There is discussion in Banfield & Raftery (1993) of a Schwarz-like 
criterion for clustering, but with a rather different derivation and I 
don't think it should be attributed to Schwarz.


On Wed, 5 Dec 2007, Serguei Kaniovski wrote:

>
> Hello,
>
> how is, for example, the Schwarz criterion is defined for kmeans? It should
> be something like:
>
> k <- 2
> vars <- 4
> nobs <- 100
>
> dat <- rbind(matrix(rnorm(nobs, sd = 0.3), ncol = vars),
>           matrix(rnorm(nobs, mean = 1, sd = 0.3), ncol = vars))
>
> colnames(dat) <- paste("var",1:4)
>
> (cl <- kmeans(dat, k))
>
> schwarz <- sum(cl$withinss)+ vars*k*log(nobs)
>
> Thanks for your help,
> Serguei
> ________________________________________
> Austrian Institute of Economic Research (WIFO)
>
> P.O.Box 91                          Tel.: +43-1-7982601-231
> 1103 Vienna, Austria        Fax: +43-1-7989386
>
> Mail: Serguei.Kaniovski at wifo.ac.at
> http://www.wifo.ac.at/Serguei.Kaniovski
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list