[R] Help with K-Means output

David L Carlson dc@rl@on @ending from t@mu@edu
Sat Dec 8 17:11:42 CET 2018


You should also read the manual page for ?split and learn how to work with lists:

# Split the data according to cluster membership
# to create a list of data frames
rr0.clus <- split(rr0, rr0a$cluster)

# The data frame for cluster 1:
rr0.clus[[1]]

--------------------------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University

-----Original Message-----
From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of Bert Gunter
Sent: Saturday, December 8, 2018 9:46 AM
To: Bill.Poling using zelis.com
Cc: R-help <r-help using r-project.org>
Subject: Re: [R] Help with K-Means output

Please see ?kmeans and note the "cluster" component of the returned value
that would appear to provide the info you seek.

-- Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, Dec 8, 2018 at 7:03 AM Bill Poling <Bill.Poling using zelis.com> wrote:

> Good afternoon. I hope I have provided enough info to get my question
> answered.
>
> I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456
>
> When running a K-Means clustering routine is it possible to get the actual
> data from each cluster into a DF?
>
> I have reviewed a number of tutorials and unless I missed it somewhere I
> would like to know if it is possible.
>
> https://www.datacamp.com/community/tutorials/k-means-clustering-r
> https://....guru99..../r-k-means-clustering.html
> https://datascienceplus.com/k-means-clustering-in-r/
> https://datascienceplus.com/finding-optimal-number-of-clusters/
> http://enhancedatascience.com/2017/10/24/machine-learning-explained-kmeans/
> http://enhancedatascience.com/2017/04/30/r-basics-k-means-r/
>
> For example:
>
> I ran the below and get K-means clustering with 10 clusters of sizes 1511,
> 1610, 702, 926, 996, 1076, 580, 2429, 728, 3797
> Can the 1511 values of SavingsReversed and ProviderID , 1610 values of
> SavingsReversed and ProviderID, etc.. be run out into DF's?
>
> Thank you for your help.
>
> WHP
>
> str(rr0)
> Classes 'data.table' and 'data.frame':14355 obs. of  2 variables:
>  $ SavingsReversed: num  0 0 61 128 160 ...
>  $ ProviderID     : num  113676 113676 116494 116641 116641 ...
>  - attr(*, ".internal.selfref")=<externalptr>
>
> head(rr0, n=35)
>     SavingsReversed ProviderID
>  1:            0.00     113676
>  2:            0.00     113676
>  3:           61.00     116494
>  4:          128.25     116641
>  5:          159.60     116641
>  6:          372.66     119316
>  7:           18.79     121319
>  8:           15.64     121319
>  9:            0.00     121319
> 10:           18.79     121319
> 11:           23.00     121319
> 12:           18.79     121319
> 13:            0.00     121319
> 14:           25.86     121319
> 15:           14.00     121319
> 16:          113.00     121545
> 17:           50.00     121545
> 18:         1155.32     121545
> 19:          113.00     121545
> 20:          197.20     121545
> 21:            0.00     121780
> 22:           36.00     122536
> 23:         1171.32     125198
> 24:         1171.32     125198
> 25:           43.00     125303
> 26:            0.00     125881
> 27:           69.64     128435
> 28:          420.18     128435
> 29:          175.18     128435
> 30:           71.54     128435
> 31:           99.85     128435
> 32:            0.00     128435
> 33:           42.75     128435
> 34:          175.18     128435
> 35:          846.45     128435
>
> set.seed(213)
> rr0a <- kmeans(rr0, 10)
> View(rr0a)
> summary(rr0a)
> # Length Class  Mode
> # cluster      14355  -none- numeric
> # centers         20  -none- numeric
> # totss            1  -none- numeric
> # withinss        10  -none- numeric
> # tot.withinss     1  -none- numeric
> # betweenss        1  -none- numeric
> # size            10  -none- numeric
> # iter             1  -none- numeric
> # ifault           1  -none- numeric
>
> x1 <- as.data.frame(rr0a$centers)
> sort(x1)
> #SavingsReversed ProviderID
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 5        101.21070  3558532.7
> # 4        103.41147  3893274.4
> # 1        105.38310  2241031.2
> # 8        114.61562  3240701.5
> # 10       121.14184  4718727.6
> # 9        153.70536  4470878.9
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> print(rr0a)
> # K-means clustering with 10 clusters of sizes 1511, 1610, 702, 926, 996,
> 1076, 580, 2429, 728, 3797
> #
> # Cluster means:
> #   SavingsReversed ProviderID
> # 1        105.38310  2241031.2
> # 2         75.19665  2773789.2
> # 3         99.31959  4147091.6
> # 4        103.41147  3893274.4
> # 5        101.21070  3558532.7
> # 6        156.84426  5560636.6
> # 7        185.09745   173732.9
> # 8        114.61562  3240701.5
> # 9        153.70536  4470878.9
> # 10       121.14184  4718727.6
> #Within cluster sum of squares by cluster:
> # [1] 74529288379846 25846368411171  4692898666512  6277704963344
> 8428785199973 90824041558798  1468798013919 12143462193009  5483877005233
> # [10] 51547955737867
> # (between_SS / total_SS =  98.7 %)
> #
> # Available components:
> #
> #   [1] "cluster"      "centers"      "totss"        "withinss"
>  "tot.withinss" "betweenss"    "size"         "iter"         "ifault"
>
>
>
>
>
>
>
>
>
> Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}}
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list