[R] How a clustering algorithm in R can end up with negative silhouette values?

Sarah Goslee sarah.goslee at gmail.com
Fri Feb 19 20:58:50 CET 2016


You need to think more carefully about the details of the clara() method.

The algorithm draws repeated samples of sampsize from the larger
dataset, as specified by the arguments to the function.
It clusters each sample in turn, and saves the best one.
It uses the medoids from the best one to assign all of the points to a cluster.

But because the clustering is based on a subsample, it may not be
representative of the dataset as a whole, and may not provide a good
clustering overall. Just because it clusters the subsample well,
doesn't mean it clusters the entirety. The details section of the help
describes this, and the book references goes into more detail.

Sarah



On Fri, Feb 19, 2016 at 2:55 PM, ABABAEI, Behnam
<Behnam.ABABAEI at limagrain.com> wrote:
> Hi Sarah,
>
> Thank you for the response. But it is said in its description that after
> each run (sample), each observation in the whole dataset is assigned to the
> closest cluster. So how is it possible for one observation to be wrongly
> allocated, even with clara?
>
> Behnam
>
> Behnam
>
>
>
>
> On Fri, Feb 19, 2016 at 11:48 AM -0800, "Sarah Goslee"
> <sarah.goslee at gmail.com> wrote:
>
> That means that points have been assigned to the wrong groups. This
> may readily happen with a clustering method like cluster::clara() that
> uses a subset of the data to cluster a dataset too large to analyze as
> a unit. Negative silhouette numbers strongly suggest that your
> clustering parameters should be changed.
>
> Sarah
>
> On Fri, Feb 19, 2016 at 6:33 AM, ABABAEI, Behnam
> <Behnam.ABABAEI at limagrain.com> wrote:
>> Hi,
>>
>>
>> We know that clustering methods in R assign observations to the closest
>> medoids. Hence, it is supposed to be the closest cluster each observation
>> can have. So, I wonder how it is possible to have negative values of
>> silhouette , while we are supposedly assign each observation to the closest
>> cluster and the formula in silhouette method cannot get negative?
>>
>>
>> Behnam.
>>



More information about the R-help mailing list