[R] kmeans

Gavin Simpson gavin.simpson at ucl.ac.uk
Tue Mar 20 19:43:46 CET 2007


On Tue, 2007-03-20 at 19:10 +0100, Sergio Della Franca wrote:
> Dear R-helpers,
> 
> I have this dataset(y):
> 
>   YEAR   PRODUCTS
>   1             10
>   2             42
>   3             25
>   4             42
>   5             40
>   6             45
>   7             44
>   8             47
>   9             42
> 
> I perform kmeans clustering, and the results are the following:
> 
> 
> Cluster means:
>       YEAR  PRODUCTS
> 1 3.666667 41.33333
> 2 7.500000 44.50000
> 3 2.000000 17.50000
> 
> Clustering vector:
> 1 2 3 4 5 6 7 8 9
> 3 1 3 1 1 2 2 2 2
> Now my problem is add acolumn at my dataset(y) whit the information of
> clustering vector, i.e.:
> 
>    YEAR   PRODUCTS *clustering vector*
>   1             10                    *3*
>   2             42                    *1*
>   3             25                    *3*
>   4             42                    *1*
>   5             40                    *1*
>   6             45                    *2*
>   7             44                    *2*
>   8             47                    *2*
>   9             42                    *2*
> 
> 
> How can I obtain my new dataset with the information of clustering
> vector?

Given dat is your data.frame:

> dat
  YEAR PRODUCTS
1    1       10
2    2       42
3    3       25
4    4       42
5    5       40
6    6       45
7    7       44
8    8       47
9    9       42

then the following does what you want:

set.seed(12345)
clust <- kmeans(dat, 3) # 3 clusters as per example
new.dat <- data.frame(dat, Cluster = clust$cluster)
new.dat

Gives a new data frame with the extra column:

  YEAR PRODUCTS Cluster
1    1       10       1
2    2       42       3
3    3       25       1
4    4       42       3
5    5       40       3
6    6       45       2
7    7       44       2
8    8       47       2
9    9       42       2

Or if you really want to add to the original data do this directly:

dat$Cluster <- clust$cluster

which yields:

> dat
  YEAR PRODUCTS cluster
1    1       10       1
2    2       42       3
3    3       25       1
4    4       42       3
5    5       40       3
6    6       45       2
7    7       44       2
8    8       47       2
9    9       42       2

This is all covered in "An Introduction to R", which the posting guide
asks you to read.

HTH

G
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Gavin Simpson                 [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list