[R] Empty cluster / segfault using vanilla kmeans with version 2.15.2

Uwe Ligges ligges at statistik.tu-dortmund.de
Wed Mar 13 20:48:39 CET 2013



On 13.03.2013 13:45, Dr. Detlef Groth wrote:
> Hello,
>
> here is a working reproducible example which crashes R using kmeans or
> gives empty clusters using the nstart option with R 15.2.
>
>
> library(cluster)
> kmeans(ruspini,4)
> kmeans(ruspini,4,nstart=2)
> kmeans(ruspini,4,nstart=4)
> kmeans(ruspini,4,nstart=10)
> ?kmeans
>
> either we got empty always clusters and or, after some further commands
> an segfault.

Yes, thanks, I can reproduce it in 2.15.3, but not in R-prerelease.

Maybe this is a side effect of a bug already fixed in R-prerelease. 
Since R-2.15.3 is frozen now, please upgrade to R-prerelease to become 
R-3.0.0 in April.

Best,
Uwe Ligges

>
> regards,
> Detlef Groth
>
> ------------
>
>
> [R] Empty cluster / segfault using vanilla kmeans with version 2.15.2
> Uwe Ligges ligges at statistik.tu-dortmund.de
> Sat Feb 9 20:52:19 CET 2013
>
>      Previous message: [R] Empty cluster / segfault using vanilla kmeans
> with version 2.15.2
>      Next message: [R] Fractional logit in GLM?
>      Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>
> We need a reproducible example.
>
> Uwe Ligges
>
>
> On 03.02.2013 15:03, Luca Nanetti wrote:
>> Dear experts,
>> I am encountering a version-dependent issue.
>>
>> My laptop runs Ubuntu 12.04 LTS 64-bit, R 2.14.1; the issue explained
>> below
>> never occurred with this version of R
>> My desktop runs Ubuntu 11.10 64-bit, R 2.13.2; what follows applies to
>> this
>> setup.
>>
>> The data I'm clustering is constituted by the rows of a 320 x 6 matrix
>> containing integers ranging from 1 to 7, no missing data.
>> I applied kmeans() to this matrix, literally, 256 x 10⁶ times using R
>> version 2.13.2 or 2.14.1, without never experiencing the slightest
>> problem.
>> My usual setup is with k=5, nstart=256, iter.max=50.
>>
>> Upgrading to R 2.15.2, I experienced either a warning message ('Empty
>> cluster. Choose a better set of initial centers') or a catastrophic
>> segfault. The only way I can get a solution whatsoever is putting
>> nstart to
>> its default value, i.e. 1. However, just repeating the clustering, the
>> same
>> issue still happen. Moreover, this is vastly suboptimal, because the risk
>> of local minima.
>>
>> Something similar was reported many years ago, see
>> https://stat.ethz.ch/pipermail/r-help/2003-November/041784.html. It was
>> then suggested that R's behaviour was correct. I'm not familiar with such
>> an early R version, but the up-to-date documentation of kmeans clearly
>> states that "Except for the Lloyd-Forgy method, k clusters will always be
>> returned if a number is specified.".
>> I am using the default Hartigan-Wong, and I specify an exact number k:
>> thus, k clusters should be returned. They aren't, and the empty
>> cluster is
>> then more likely the symptom of a bug rather than the outcome of a 'true'
>> local minimum.
>>
>> Using synaptic, I managed to downgrade R to version 2.13.2. The problem
>> disappeard, i.e. the previous message/segfault didn't occur anymore.
>>
>> Summarizing: given the same dataset, either an unreasonable message or a
>> segfault regularly happen in version 2.15.2 by invoking kmeans() on an
>> Ubuntu 11.10 64bit machine. This does not happen at all in previous
>> versions of R, on the same machine and operating system.
>>
>> I respectfully suggest that the behaviour shown in the aforementioned
>> versions 2.13.2 and 2.14.1 should be considered 'normal', and that
>> version
>> 2.15.2 should revert to that.
>>
>> Kind regards,
>> Luca Nanetti.
>>
>>     [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list