[R] keep the centre fixed in K-means clustering
Uwe Ligges
ligges at statistik.tu-dortmund.de
Wed May 22 11:55:32 CEST 2013
So you just want to compare the distances from each point of your new
data to each of the Centres and assign the corresponding number of the
centre as in:
clust <- apply(NewData, 1, function(x) which.min(colSums(x - tCentre)^2))))
but since the apply loop is rather long here for lots of new data, one
may want to optimize the runtime for huge data and get:
tNewData <- t(NewData)
clust <- max.col(-apply(Centre, 1, function(x) colSums((x - tNewData)^2)))
Best,
Uwe Ligges
On 21.05.2013 13:19, HJ YAN wrote:
> Dear R users
>
>
> I have the matrix of the centres of some clusters, e.g. 20 clusters each
> with 100 dimentions, so this matrix contains 20 rows * 100 columns numeric
> values.
>
> I have collected new data (each with 100 numeric values) and would like to
> keep the above 20 centres fixed/'unmoved' whilst just see how my new data
> fit in this grouping system, e.g. if the data is close to cluster 1 than
> lable it 'cluster 1'.
>
> If the above matrix of centre is called 'Centre' (a 20*100 matrix) and my
> new data 'NewData' has 500 observations, by using kmeans() will update the
> centres:
>
> kmeans(NewData, Centre)
>
>
> I wondered if there is other R packages out there can keep the centres
> fixed and lable each observations of my new data? Or I have to write my own
> function?
>
> To illustrate my task using a simpler example:
>
> I have
>
> Centre<- matrix(c(0,1,0,1), nrow=2)
>
> # the two created centres in a two dimentional case are
> Centre
> [,1] [,2]
> [1,] 0 0
> [2,] 1 1
>
> NewData<-rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
> matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
>
> NewData1<-cbind(c1:100), NewData)
> colnames(NewData1)<-c("ID","x","y")
>
> # my data
> head(NewData1)
> ID x y
> [1,] 1 -0.3974660 0.1541685
> [2,] 2 0.5321347 0.2497867
> [3,] 3 0.2550276 0.1691720
> [4,] 4 -0.1162162 0.6754874
> [5,] 5 0.1570996 0.1175119
> [6,] 6 0.4816195 -0.6836226
>
> ## I'd like to have outcome as below (whilst keep the tow centers fixed):
>
> ID x y Cluster
> [1,] 1 -0.3974660 0.1541685 1
> [2,] 2 0.5321347 0.2497867 1
> [3,] 3 0.2550276 0.1691720 1
> [4,] 4 -0.1162162 0.6754874 1
>
> ...
> [55,] 55 1.1570996 1.1175119 2
> [56,] 56 1.4816195 1.6836226 2
>
>
> p.s. I use Euclidian to obtain/calculate distance matrix.
>
>
> Many thanks in advance
>
> HJ
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list