[R] Clustering of datasets

Mon Sep 5 15:02:35 CEST 2022

Hello,

I am not at all sure that the following answers the question.
The code below ries to find the optimal number of clusters. One of the 
changes I have made to your call to kmeans is to subset DMs not dropping 
the dim attribute.

library(cluster)

max_clust <- 10
wss <- numeric(max_clust)

for(k in 1:max_clust) {
   km <- kmeans(DMs[,2], centers = k, nstart = 25)
   wss[k] <- km$tot.withinss
}
plot(wss, type = "b")

dm <- DMs[, 2, drop = FALSE]
# Where is the elbow, at 2 or at 4?
factoextra::fviz_nbclust(dm, kmeans, method = "wss")
factoextra::fviz_nbclust(dm, kmeans, method = "silhouette")

k2 <- kmeans(dm, centers = 2, nstart = 25)
k3 <- kmeans(dm, centers = 3, nstart = 25)
k4 <- kmeans(dm, centers = 4, nstart = 25)

main2 <- paste(length(k2$centers), "clusters")
main3 <- paste(length(k3$centers), "clusters")
main4 <- paste(length(k4$centers), "clusters")

old_par <- par(mfcol = c(1, 3))
plot(DMs[,2], col = k2$cluster, pch = 19, main = main2)
plot(DMs[,2], col = k3$cluster, pch = 19, main = main3)
plot(DMs[,2], col = k4$cluster, pch = 19, main = main4)
par(old_par)

Hope this helps,

Rui Barradas

Às 12:31 de 05/09/2022, Subhamitra Patra escreveu:
> Dear all,
> 
> I am about to cluster my datasets by using K-mean clustering techniques in
> R, but getting some type of scattered results. Herewith I pasted my code
> below. Please suggest to me where I am lacking in my code. I was pasting my
> data before applying the K-mean method as follows.
> 
> DMs<-read.table(text="Country DATA
>                        IS -0.0092
>                        BA -0.0235
>                        HK -0.0239
>                        JA -0.0333
>                        KU -0.0022
>                        OM -0.0963
>                        QA -0.0706
>                        SK -0.0322
>                        SA -0.1233
>                        SI -0.0141
>                        TA -0.0142
>                        UAE -0.0656
>                        AUS -0.0230
>                       BEL -0.0006
>                       CYP -0.0085
>                       CR  -0.0398
>                      DEN  -0.0423
>                        EST -0.0604
>                        FIN -0.0227
>                        FRA -0.0085
>                       GER -0.0272
>                       GrE -0.3519
>                       ICE -0.0210
>                       IRE -0.0057
>                       LAT -0.0595
>                      LITH -0.0451
>                      LUXE -0.0023
>                      MAL  -0.0351
>                      NETH -0.0048
>                        NOR -0.0495
>                        POL -0.0081
>                      PORT -0.0044
>                      SLOVA -0.1210
>                      SLOVE -0.0031
>                        SPA -0.0213
>                        SWE -0.0106
>                      SWIT -0.0152
>                        UK -0.0030
>                      HUNG -0.0086
>                        CAN -0.0144
>                      CHIL -0.0078
>                        USA -0.0042
>                      BERM -0.0035
>                      AUST -0.0211
>                      NEWZ -0.0538" ,
>                   header = TRUE,stringsAsFactors=FALSE)
> library(cluster)
> k1<-kmeans(DMs[,2],centers=2,nstart=25)
> plot(DMs[,2],col=k1$cluster,pch=19,xlim=c(1,46), ylim=c(-0.12,0))
> text(1:46,DMs[,2],DMs[,1],col=k1$cluster)
> legend(10,1,c("cluster 1: Highly Integrated","cluster 2: Less Integrated"),
> col=1:2,pch=19)
> 
>