[R] cluster analysis: mean values for each variable and cluster

Uwe Ligges ligges at statistik.tu-dortmund.de
Fri Feb 20 17:19:11 CET 2009



jgaspard wrote:
> Hi all!
> 
> I'm new to R and don't know many about it. Because it is free, I managed to
> learn it a little bit.
> 
> Here is my problem: I did a cluster analysis on 30 observations and 16
> variables (monde, figaro, liberation, etc.). Here is the .txt data file:
> 
> "monde","figaro","liberation","yespeople","nopeople","bxl","europe","ue","union_eur","other","yesmeto","nometo","yesfonc","nofonc","yestone","notone"
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,1,0,1,0,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,1,0,0,1
> 1,0,0,0,1,0,0,0,1,0,0,1,1,0,0,1
> 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,0,1,0,1,1,0,1,0
> 1,0,0,0,1,0,0,0,1,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,0,1,0,1,0,1,1,0
> 1,0,0,0,1,0,1,0,0,0,0,1,0,1,0,1
> 0,1,0,0,1,0,0,1,0,0,0,1,1,0,1,0
> 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
> 1,0,0,0,1,0,0,1,0,0,0,1,0,1,1,0
> 0,1,0,0,1,0,0,0,1,0,0,1,1,0,1,0
> 0,0,1,0,1,0,0,1,0,0,0,1,0,1,1,0
> 0,1,0,1,0,0,1,0,0,0,0,1,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,0,1,0,1
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 0,1,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 1,0,0,0,1,1,0,0,0,0,1,0,1,0,1,0
> 
> 
> The steps I made were those:
> 
> headlines=read.table("/data.csv", header=T, sep=",")
> data
> dist=dist(data,method="euclidean")
> dist
> cluster=hclust(dist,method="ward")
> cluster
> plot(cluster)
> rect.hclust(cluster, k=4, border="red")
> 
> I extracted 4 clusters from the data. My question is: is it possible to
> produce a summary of every mean values for each variable of each of the 4
> clusters?


Well, I think this is not what you want.
Probably you want to use Manhattan distance (rather than Euclidean) 0/1 
data and you want to know the number of 1s and the total number in each 
cluster.

Anyway, in order to answer your question, do an assignment in the end 
such as:

x <- rect.hclust(cluster, k=4, border="red")
sapply(x, function(i) colMeans(data[i,]))

Uwe Ligges



> Thanks a lot in advance,
> 
> Jeoffrey
> 
> 
> 
>




More information about the R-help mailing list