[R] How to extract text contexts after clustering.

David L Carlson dcarlson at tamu.edu
Mon May 22 17:04:25 CEST 2017


As Ismail notes, you did not give us your code, only a few disconnected bits of your code. Assuming that by "top 1 group" you mean the largest group, here is a reproducible example:

# First create a reproducible set of data
set.seed(42)
mydata <- matrix(rnorm(300, 50, 10), 100, 3)
# A matrix with 100 rows and 3 columns of random normal variates

# Run kmeans and look at the structure of the returned object
mydata.km <- kmeans(mydata, centers=10)
str(mydata.km)
List of 9
 $ cluster     : int [1:100] 5 9 3 6 1 1 10 1 10 8 ...
 $ centers     : num [1:10, 1:3] 53.8 31.8 54.5 40.1 61 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:10] "1" "2" "3" "4" ...
  .. ..$ : NULL
 $ totss       : num 29069
 $ withinss    : num [1:10] 601 868 443 1242 717 ...
 $ tot.withinss: num 6554
 $ betweenss   : num 22515
 $ size        : int [1:10] 13 10 9 11 10 5 7 14 13 8
 $ iter        : int 3
 $ ifault      : int 0
 - attr(*, "class")= chr "kmeans"

# "size" is the number of observation in each cluster
# "cluster" is the cluster membership for each observation

which.max(mydata.km$size)
[1] 8
table(mydata.km$cluster)

 1  2  3  4  5  6  7  8  9 10 
13 10  9 11 10  5  7 14 13  8 

# which.max() shows you which cluster is the 
# largest, cluster number 8
# By sorting "size" you lost the information
# about which cluster was the largest
# table() shows you the number of observations in each cluster
# You can see that cluster 8 has 14 observations
# Now print the 14 observations that belong to cluster 8

mydata[mydata.km$cluster == 8, ]
          [,1]     [,2]     [,3]
 [1,] 49.37286 51.19161 48.14622
 [2,] 47.21211 44.95783 49.15892
 [3,] 56.35950 46.17666 50.37415
 [4,] 47.15747 44.87350 48.67912
 [5,] 48.28083 51.24702 44.78204
 [6,] 45.69531 45.71741 48.25982
 [7,] 47.42731 43.86328 55.15668
 [8,] 54.55450 55.67621 47.28236
 [9,] 56.42899 47.26354 51.90019
[10,] 50.89833 41.99718 50.46564
[11,] 55.81824 51.63207 53.83847
[12,] 50.88440 53.68807 44.30694
[13,] 48.79103 52.94654 56.35514
[14,] 45.23826 46.54912 54.46041

-------------------------------------
David L Carlson
Department of Anthropology
Texas A&M University
College Station, TX 77840-4352


-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Ismail SEZEN
Sent: Sunday, May 21, 2017 10:09 PM
To: θ ” <yarmi1224 at hotmail.com>
Cc: r-help at r-project.org
Subject: Re: [R] How to extract text contexts after clustering.

1- PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
2- PLEASE, first _read_ help for kmeans (?kmeans) function before using function.

> On 22 May 2017, at 05:33, θ ” <yarmi1224 at hotmail.com> wrote:
> 
> hi:
> I need to extract the text contexts of top 1 group after clustering.
> But I have no idea how to sort the cluster size then extract the contexts of top 1 clusters.

There isn’t a _top_ cluster for kmeans algorithm. There are _only_ clusters!

> 
> here is my cluster code:
> 
>> file <- read.csv("SiC CMP.csv", header = TRUE)

We don’t know what is in file$Main.IPC.

>> cluster_k<-length(unique(file$Main.IPC))
>> cl <- kmeans(IPC_Dtm , cluster_k)

What is IPC_Dtm?

> 
> 
> I have tried use��
> 
>> sort(cl$size, decreasing=T)

if you read the documentation, you would know cl$size means the number of points in each cluster. So, why do you sort them?

> [1] 341 107 104  80  51  22  15  11  10   8   8   5   5   5   4   4   4   3   3   2   2
> [22]   2   2   2   2   2   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
> [43]   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
> 
> But I have no idea how to extract the contexts of top 1 cluster.

If you read the _Value_ section of kmeans documentation, you will have an idea how to extract context by using cl$cluster.

> 
> 
> Eva
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list