[R] Kmeans performance difference

Moisan Yves ymoisan at groupesm.com
Wed Jul 4 20:58:15 CEST 2007


Hi All,

A question from a newbie using R 2-5-0 on windows XP.  Why is it that
kmeans clustering with apparently the exact same parameters behaves so
differently between the two following examples :

> cl1 <- kmeans(subset(pointsUXO15555, select = c(2:4)), 10)

Takes about 2 seconds to deliver a result

> cl1 <- clust(subset(pointsUXO15555, select = c(2:4)), k=10,
method="kmeansHartigan") 

Dies after about 10 minutes and fills up RAM :   

*** running kmeansHartigan cluster algorithm...

 *** calculating validity measure... 
Erreur : impossible d'allouer un vecteur de taille 922.9 Mo
De plus : Warning messages:
1: Reached total allocation of 1023Mb: see help(memory.size) 
2: Reached total allocation of 1023Mb: see help(memory.size) 
3: Reached total allocation of 1023Mb: see help(memory.size) 
4: Reached total allocation of 1023Mb: see help(memory.size)

If I understand correctly, both methods should give the sameish results
(modulo the initial random locations) since the default in kmeans is
"Hartigan-Wong".  My data frame is 3 columns X 15555 lines.  It must be
that kmeans is more a "core" R function whereas clust id from the
clustTool package, but isn't clustTool simply wrapping the core kmeans
method ?  Why such a difference ?

TIA,

Yves Moisan



More information about the R-help mailing list