[R] Dynamic clustering?

Ralf B ralf.bierig at gmail.com
Thu May 6 17:28:34 CEST 2010

```The problem here is that distances between the two cases change
dynamically across different sets, I have 100 of such sets. I guess
there is no better solution than finding an experience value from a
training set, isn't it?

Ralf

On Wed, May 5, 2010 at 6:04 PM, Phil Spector <spector at stat.berkeley.edu> wrote:
> Ralf -
>   I think you're making things more complicated than they
> need to be.  All clustering methods are based on the distances
> between observations.  If the observations are all close
> together, the distances between them won't be very large.
> If some are farther away than others, then the distances will
> be larger.   The first case would suggest just one cluster,
> while the second case would suggest more than one.  For your
> example:
>
>> two <- c(1,2,3,2,3,1,2,3,400,300,400)
>> one <- c(400,402,405, 401,410,415, 407,412)
>> max(dist(one))
>
> [1] 15
>>
>> max(dist(two))
>
> [1] 399
>
> A little experimentation should provide you with a cut off
> that should reliably tell you whether there are 0 or 1 clusters in your
> data.
>
>                                        - Phil Spector
>                                         Statistical Computing Facility
>                                         Department of Statistics
>                                         UC Berkeley
>                                         spector at stat.berkeley.edu
>
>
> On Wed, 5 May 2010, Ralf B wrote:
>
>> Are there R packages that allow for dynamic clustering, i.e. where the
>> number of clusters are not predefined? I have a list of numbers that
>> falls in either 2 or just 1 cluster. Here an example of one that
>> should be clustered into two clusters:
>>
>> two <- c(1,2,3,2,3,1,2,3,400,300,400)
>>
>> and here one that only contains one cluster and would therefore not
>> need to be clustered at all.
>>
>> one <- c(400,402,405, 401,410,415, 407,412)
>>
>> Given a sufficiently large amount of data, a statistical test or an
>> effect size should be able to determined if a data set makes sense to
>> be divided i.e. if there are two groups that differ well enough. I am
>> not familiar with the underlying techniques in kmeans, but I know that
>> it blindly divides both data sets based on the predefined number of
>> clusters. Are there any more sophisticated methods that allow me to
>> determine the number of clusters in a data set based on statistical
>> tests or effect sizes ?
>>
>> Is it possible that this is not a clustering problem but a
>> classification problem?
>>
>> Ralf
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help