# [R] How to extract text contexts after clustering.

David L Carlson dcarlson at tamu.edu
Mon May 22 17:04:25 CEST 2017

```As Ismail notes, you did not give us your code, only a few disconnected bits of your code. Assuming that by "top 1 group" you mean the largest group, here is a reproducible example:

# First create a reproducible set of data
set.seed(42)
mydata <- matrix(rnorm(300, 50, 10), 100, 3)
# A matrix with 100 rows and 3 columns of random normal variates

# Run kmeans and look at the structure of the returned object
mydata.km <- kmeans(mydata, centers=10)
str(mydata.km)
List of 9
\$ cluster     : int [1:100] 5 9 3 6 1 1 10 1 10 8 ...
\$ centers     : num [1:10, 1:3] 53.8 31.8 54.5 40.1 61 ...
..- attr(*, "dimnames")=List of 2
.. ..\$ : chr [1:10] "1" "2" "3" "4" ...
.. ..\$ : NULL
\$ totss       : num 29069
\$ withinss    : num [1:10] 601 868 443 1242 717 ...
\$ tot.withinss: num 6554
\$ betweenss   : num 22515
\$ size        : int [1:10] 13 10 9 11 10 5 7 14 13 8
\$ iter        : int 3
\$ ifault      : int 0
- attr(*, "class")= chr "kmeans"

# "size" is the number of observation in each cluster
# "cluster" is the cluster membership for each observation

which.max(mydata.km\$size)
[1] 8
table(mydata.km\$cluster)

1  2  3  4  5  6  7  8  9 10
13 10  9 11 10  5  7 14 13  8

# which.max() shows you which cluster is the
# largest, cluster number 8
# By sorting "size" you lost the information
# about which cluster was the largest
# table() shows you the number of observations in each cluster
# You can see that cluster 8 has 14 observations
# Now print the 14 observations that belong to cluster 8

mydata[mydata.km\$cluster == 8, ]
[,1]     [,2]     [,3]
[1,] 49.37286 51.19161 48.14622
[2,] 47.21211 44.95783 49.15892
[3,] 56.35950 46.17666 50.37415
[4,] 47.15747 44.87350 48.67912
[5,] 48.28083 51.24702 44.78204
[6,] 45.69531 45.71741 48.25982
[7,] 47.42731 43.86328 55.15668
[8,] 54.55450 55.67621 47.28236
[9,] 56.42899 47.26354 51.90019
[10,] 50.89833 41.99718 50.46564
[11,] 55.81824 51.63207 53.83847
[12,] 50.88440 53.68807 44.30694
[13,] 48.79103 52.94654 56.35514
[14,] 45.23826 46.54912 54.46041

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ismail SEZEN
Sent: Sunday, May 21, 2017 10:09 PM
To: θ ” <yarmi1224 at hotmail.com>
Cc: r-help at r-project.org
Subject: Re: [R] How to extract text contexts after clustering.

2- PLEASE, first _read_ help for kmeans (?kmeans) function before using function.

> On 22 May 2017, at 05:33, θ ” <yarmi1224 at hotmail.com> wrote:
>
> hi:
> I need to extract the text contexts of top 1 group after clustering.
> But I have no idea how to sort the cluster size then extract the contexts of top 1 clusters.

There isn’t a _top_ cluster for kmeans algorithm. There are _only_ clusters!

>
> here is my cluster code:
>

We don’t know what is in file\$Main.IPC.

>> cluster_k<-length(unique(file\$Main.IPC))
>> cl <- kmeans(IPC_Dtm , cluster_k)

What is IPC_Dtm?

>
>
> I have tried use��
>
>> sort(cl\$size, decreasing=T)

if you read the documentation, you would know cl\$size means the number of points in each cluster. So, why do you sort them?

> [1] 341 107 104  80  51  22  15  11  10   8   8   5   5   5   4   4   4   3   3   2   2
> [22]   2   2   2   2   2   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
> [43]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
>
> But I have no idea how to extract the contexts of top 1 cluster.

If you read the _Value_ section of kmeans documentation, you will have an idea how to extract context by using cl\$cluster.

>
>
> Eva
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help