[R] Kmeans cluster analysis

paulandpen paulandpen at optusnet.com.au
Wed Apr 11 16:01:15 CEST 2007


 Hi Nataniel,

A quick and easy way to do this is to run a decision tree analysis (like 
chaid) or discriminant analysis using the cluster variables you used as the 
predictors and the cluster membership as the outcome variable.  The decision 
tree will highlight in a monotonic sense which variables in respective order 
drive the clusters into groups. The algorithm is upside down in a sense, but 
it does help to identify what is driving your clusters apart.

This might help to identify the bigger differences in some of the groups

Remember when comparing cluster groups you need to look at differences 
between the groups and also the standard deviation (dispersion around 
clusters)

HTH Paul
----- Original Message ----- 
From: "Monica Pisica" <pisicandru at hotmail.com>
To: <r-help at stat.math.ethz.ch>
Sent: Wednesday, April 11, 2007 11:31 PM
Subject: [R] Kmeans cluster analysis


> Hi Nataniel,
>
> As far as i know there is a package called clustTool which has a very nice
> interface with the capability to do different cluster analyses. It also
> prodused a plot of each cluster and the mean for each cluster of each
> variable - and i guess this is what you are after! But depending of which
> parameters you are using for the cluster analysis, the package is 
> extremely
> slow if you have more than 5000 datapoints. Maybe you can take the 
> function
> apart to see where and what generates the plot and use that for your
> analysis.
>
> I hope this helps,
>
> Monica Palaseanu-Lovejoy
>
>
> Message: 35
> Date: Tue, 10 Apr 2007 19:51:24 +0000 (GMT)
> From: nathaniel Grey <nathaniel.grey at yahoo.co.uk>
> Subject: [R] Kmeans cluster analysis
> To: r-help at stat.math.ethz.ch
> Message-ID: <352480.52445.qm at web23402.mail.ird.yahoo.com>
> Content-Type: text/plain
>
> Hello,
>
> I have a data-set containing  22 variables, after appropriate
> transformations etc I  ran  a
> kmeans cluster analysis for 4 clusters , I ran it 20 times to find a 
> result
> with the lowest
> within sum of squares.
>
> My question is how best do I go about finding out what the characteristics
> are of each cluster?
> Is one cluster dominated by a particular set of variables or by a 
> particular
> variable?
>
> The only way I know is to to look at the means for each variable for each
> cluster, but as there
> are 22 variables this is time consuming.
>
> Is there a way to graphically represent the clusters in relation to the
> variables...if so I
> might need some guidance on the coding as I am new to the R environment.
>
> Any advice and direction would be gratefully received.
>
> best wishes,
>
> Nataniel Grey
>
> _________________________________________________________________
>
> Live! http://clk.atdmt.com/MRT/go/mcrssaub0050001411mrt/direct/01/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list