[R] Dynamic clustering?

Erik Iverson eriki at ccbr.umn.edu
Wed May 5 23:32:46 CEST 2010


Hello,

Ralf B wrote:
> Are there R packages that allow for dynamic clustering, i.e. where the
> number of clusters are not predefined? I have a list of numbers that
> falls in either 2 or just 1 cluster. Here an example of one that
> should be clustered into two clusters:
> 
> two <- c(1,2,3,2,3,1,2,3,400,300,400)
> 
> and here one that only contains one cluster and would therefore not
> need to be clustered at all.
> 
> one <- c(400,402,405, 401,410,415, 407,412)
> 
> Given a sufficiently large amount of data, a statistical test or an
> effect size should be able to determined if a data set makes sense to
> be divided i.e. if there are two groups that differ well enough. I am
> not familiar with the underlying techniques in kmeans, but I know that
> it blindly divides both data sets based on the predefined number of
> clusters. Are there any more sophisticated methods that allow me to
> determine the number of clusters in a data set based on statistical
> tests or effect sizes ?

Caveat: I have very little experience with clustering methods, but maybe 
this could get you started:

http://en.wikipedia.org/wiki/Determining_the_number_of_clusters_in_a_data_set

If you only want to make 2 clusters when the means of the data are an 
order of magnitude apart or more, that's easy enough to do without a 
statistical test.

For your examples above, I naively tried some functions in the mclust 
package, which I've never used before:

mclustModel(one, (mclustBIC(one, G=1:2)))$G # gives 1
mclustModel(two, (mclustBIC(two, G=1:2)))$G # gives 2

You'll have to decide for yourself to determine if this is appropriate 
for your data...or if I'm even using these functions correctly. :)



More information about the R-help mailing list