[R] Examining how cases are similar by cluster, in cluster analysis

Sun Nov 18 22:52:34 CET 2012

Something like this?

> split(FS1, hcli8)
$`1`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
1   1  1  0  1  0  0  1  1  0   1   1   1
3   1  0  1  0  0  1  1  0  0   1   0   1
4   1  1  0  0  0  0  1  1  1   1   1   1
7   0  1  0  1  0  0  1  1  0   1   0   1
9   1  1  1  1  0  1  1  0  1   1   1   0
12  1  0  0  0  0  1  1  1  1   1   0   1
13  0  1  1  1  1  0  0  0  1   1   0   1
15  1  0  1  1  0  0  1  0  0   1   0   1
16  1  0  1  0  0  1  1  0  1   0   1   1
19  0  1  0  0  0  0  1  0  0   1   0   1
20  0  1  1  1  0  0  0  1  1   0   0   1
24  1  1  0  1  0  0  1  0  1   1   1   0
26  1  1  1  1  1  1  0  1  0   1   0   1
28  1  0  1  0  1  0  1  1  0   1   1   1
33  1  1  0  1  0  0  0  0  1   1   0   0
38  1  1  1  0  0  0  0  0  1   1   0   0
40  1  0  1  0  0  0  1  0  0   1   1   1
41  1  1  0  0  0  0  0  0  1   1   1   1
43  0  0  1  0  0  0  1  0  1   1   0   1
52  1  1  1  1  0  0  0  1  1   1   0   1
53  1  1  0  0  1  0  0  1  1   1   0   1
56  1  0  1  0  0  1  1  0  1   0   0   0
60  1  1  1  0  1  1  0  1  1   1   0   1

$`2`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
2   0  1  1  1  1  1  1  0  0   1   1   0
5   0  1  0  1  1  1  0  0  0   1   1   1
6   0  0  0  0  1  0  1  0  0   1   1   1
10  1  1  1  1  1  0  1  1  0   1   0   0
11  0  1  0  1  1  0  1  0  1   1   1   1
14  0  0  1  1  1  1  1  1  0   1   1   1
17  0  1  0  0  1  0  0  0  0   0   1   1
18  1  0  0  1  1  1  1  1  0   0   1   1
29  1  1  0  1  0  1  1  1  0   0   1   1
37  1  0  0  1  1  0  1  1  0   1   0   0
42  1  1  0  1  1  1  1  0  0   0   0   0
46  1  1  0  1  0  1  1  0  0   1   0   1
48  0  1  0  0  1  0  1  0  0   1   1   0
50  0  1  0  1  1  1  1  1  0   0   1   0
51  0  0  0  1  1  1  1  0  0   0   1   1
54  0  0  0  1  1  1  1  0  0   1   1   0
58  0  1  0  1  1  1  1  1  1   1   1   0
61  1  0  1  0  1  1  1  1  0   1   0   0

$`3`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
8   0  1  1  0  0  1  0  1  1   1   1   0
21  0  1  0  0  1  1  0  1  0   1   1   0
22  1  1  0  0  0  1  1  1  0   0   1   0
25  0  1  0  0  0  1  0  1  0   1   1   0
27  1  1  0  0  1  1  0  1  1   0   0   0
32  1  1  1  0  1  1  0  1  0   0   1   0
36  1  1  0  0  0  1  0  1  0   0   0   0
44  1  1  1  1  1  1  0  1  0   0   0   0
63  0  1  1  0  1  1  0  0  1   1   1   0

$`4`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
23  0  0  1  1  0  0  0  0  0   1   0   0
34  0  1  1  1  0  0  0  1  0   1   0   0

$`5`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
30  0  0  0  0  1  1  0  0  1   1   0   1
31  0  1  1  0  1  0  0  0  1   0   1   1
35  0  0  1  0  1  1  0  0  1   1   0   1
47  0  0  1  0  1  0  0  0  1   0   0   1
49  1  0  0  0  1  1  0  0  1   1   1   0
55  1  0  1  0  1  0  0  0  0   1   1   0
59  0  0  1  0  1  0  0  0  1   0   1   1

$`6`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
39  0  0  0  0  1  0  1  1  0   0   0   0
62  0  0  0  0  1  0  1  1  0   0   0   1

$`7`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
45  1  1  0  0  0  0  0  0  0   0   1   0

$`8`
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
57  0  0  1  0  0  1  0  1  0   0   1   1

-------
David

> -----Original Message-----
> From: Bob Green [mailto:bgreen at dyson.brisnet.org.au]
> Sent: Sunday, November 18, 2012 3:22 PM
> To: dcarlson at tamu.edu; r-help at r-project.org
> Subject: RE: [R] Examining how cases are similar by cluster, in cluster
> analysis
> 
> David,
> 
> 
> Many thanks, I'm sure this will be helpful. What would also be
> helpful is if I can extract each cluster and examine id by variable,
> within the respective cluster. I could index the variables for each
> cluster and run such an analysis but thre must be a more efficient
> way of doing this (especially as I experiment with different
> clustering methods)
> 
> Thanks again,
> 
> Bob
> 
> At 06:44 AM 19/11/2012, David L Carlson wrote:
> >If you just want a summary of the mean for each variable in each
> >cluster, this will get you there:
> >
> > > set.seed=42
> > > FS1 <- data.frame(matrix(sample(c(0, 1), 12*63, replace=TRUE),
> >nrow=63,
> >+ ncol=12))
> > > dmat <- dist(FS1, method="binary")
> > > cl.test <- hclust(dmat, method="average")
> > > plot(cl.test, hang=-1)
> > > hcli8 <- cutree(cl.test, k=8)
> > > tbl <- aggregate(FS1, by=list(Group=hcli8), mean)
> > > print(tbl, digits=4)
> >   Group     X1     X2     X3     X4     X5     X6     X7     X8
> >X9
> >1     1 0.5122 0.6829 0.6829 0.6341 0.5854 0.5854 0.6829 0.6341
> >0.5366
> >2     2 0.0000 0.0000 0.0000 1.0000 0.6667 0.6667 0.0000 0.6667
> >0.0000
> >3     3 0.9286 0.1429 0.1429 0.1429 0.2857 0.5714 0.7857 0.3571
> >0.8571
> >4     4 1.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> >0.0000
> >5     5 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
> >1.0000
> >6     6 1.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.0000
> >0.0000
> >7     7 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 0.0000
> >0.0000
> >8     8 0.0000 1.0000 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000
> >0.0000
> >      X10    X11   X12
> >1 0.4146 0.4634 0.561
> >2 0.6667 0.0000 0.000
> >3 0.8571 0.6429 0.500
> >4 1.0000 0.0000 0.000
> >5 0.0000 1.0000 0.000
> >6 0.0000 0.0000 1.000
> >7 0.0000 0.0000 0.000
> >8 0.0000 0.0000 0.000
> > >
> >----------------------------------------------
> >David L Carlson
> >Associate Professor of Anthropology
> >Texas A&M University
> >College Station, TX 77843-4352
> >
> > > -----Original Message-----
> > > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> > > project.org] On Behalf Of Bob Green
> > > Sent: Sunday, November 18, 2012 5:00 AM
> > > To: r-help at r-project.org
> > > Subject: [R] Examining how cases are similar by cluster, in
> > > cluster analysis
> > >
> > > Hello,
> > >
> > > I used the following code to perform a cluster analysis on a
> > > dataframe consisting of 12 variables (coded as 1,0) and 63
> > > cases.
> > >
> > >
> > >
> > > FS1 <- read.csv("D://Arsontest2.csv",header=T,row.names=1)
> > >
> > > str(FS1)
> > >
> > > dmat <- dist(FS1,  method="binary")
> > >
> > > cl.test <- hclust (dist(FS1, method ="binary"), "ave")
> > >
> > > plot(cl.test, hang = -1)
> > >
> > >
> > >
> > > Each case has an id and the dendogram identifies the respective
> > > cases
> > > which constitute each cluster. What I am seeking advice on is
> > > how to
> > > examine the variables on which the cases are similar, within
> > > each cluster.
> > >
> > >
> > >
> > > sort (hcli8 <- cutree(cl.test, k=8)) identifies that the
> > > following
> > > cluster 2is comprised of the following cases:
> > >
> > > 1641 2295 2594 2654 2799 3213 3510  3513 2958 3294
> > >
> > >     2         2        2       2        2        2        2
> > > 2
> > >        2        2
> > >
> > >
> > >
> > > This code provides means for the variables by cluster. In
> > > relation to
> > > cluster 2 it appears the cases should have no clear motive and
> > > be depressed :
> > >
> > > round(sapply(x, function(i) colMeans(FS1[i,])),2)
> > >
> > >                                [,1]   [,2]   [,3] [ ,4]  [,5]
> > > [,6] [,7] [,8]
> > >
> > > depressed        0.00 0.33 0.00  0.0    0  0.6 0.00 0.08
> > >
> > > unclear             0.33 1.00 1.00  1.0    0  0.0 0.07 0.12
> > >
> > >
> > >
> > > I can manually, examine this variable by variable and look at
> > > how
> > > each of the cases in cluster 2 are similar on the variables. I
> > > am
> > > looking at a more efficient and quicker way to do this.
> > >
> > > Bob
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-
> > > project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
> > > code.