# [R] keep the centre fixed in K-means clustering

Uwe Ligges ligges at statistik.tu-dortmund.de
Wed May 22 11:55:32 CEST 2013

```So you just want to compare the distances from each point of your new
data to each of the Centres and assign the corresponding number of the
centre as in:

clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2))))

but since the apply loop is rather long here for lots of new data, one
may want to optimize the runtime for huge data and get:

tNewData <- t(NewData)
clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2)))

Best,
Uwe Ligges

On 21.05.2013 13:19, HJ YAN wrote:
> Dear R users
>
>
> I have the matrix of the centres of some clusters, e.g. 20 clusters each
> with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
> values.
>
> I have collected new data (each with 100 numeric values) and would like to
> keep the above 20 centres fixed/'unmoved' whilst just see how my new data
> fit in this grouping system, e.g. if the data is close to cluster 1 than
> lable it 'cluster 1'.
>
> If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
> new data 'NewData' has 500 observations, by using kmeans() will update the
> centres:
>
> kmeans(NewData, Centre)
>
>
> I wondered if there is other R packages out there can keep the centres
> fixed and lable each observations of my new data? Or I have to write my own
> function?
>
> To illustrate my task using a simpler example:
>
> I have
>
> Centre<- matrix(c(0,1,0,1), nrow=2)
>
> # the two created centres in a two dimentional case are
> Centre
>       [,1] [,2]
> [1,]    0    0
> [2,]    1    1
>
> NewData<-rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
>              matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
>
>   NewData1<-cbind(c1:100), NewData)
> colnames(NewData1)<-c("ID","x","y")
>
> # my data
>       ID          x          y
> [1,]  1 -0.3974660  0.1541685
> [2,]  2  0.5321347  0.2497867
> [3,]  3  0.2550276  0.1691720
> [4,]  4 -0.1162162  0.6754874
> [5,]  5  0.1570996  0.1175119
> [6,]  6  0.4816195 -0.6836226
>
> ## I'd like to have outcome as below (whilst keep the tow centers fixed):
>
>             ID        x             y                      Cluster
> [1,] 1       -0.3974660 0.1541685             1
> [2,] 2        0.5321347 0.2497867             1
> [3,] 3        0.2550276 0.1691720             1
> [4,] 4       -0.1162162 0.6754874             1
>
> ...
> [55,]  55         1.1570996  1.1175119         2
> [56,]  56         1.4816195  1.6836226         2
>
>
> p.s. I use Euclidian to obtain/calculate distance matrix.
>
>
>
> HJ
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help