[R] Cluster on both categorical and numerical data

paulandpen paulandpen at optusnet.com.au
Wed Jun 18 21:58:09 CEST 2008


when you cluster information, you can have two inputs

raw data information which the algorithms converts have into a matrix and 
then processes

a pre-processed matrix which you create yourself to input into a package

essentially, packages will have a default assumption about the data you are 
using or the type of matrix you are using

these matrices are often defined in simplistic terms as either a similarity 
or dissimilarity matrix

think of a correlation matrix as an example of a matrix which represents 

i think you will need to create a dissimilarity matrix (think of something 
that is like a correlation matrix which measures similarity in the 
diagonals) and it is the opposite of this (technically not correct, but you 
get the idea I hope)

i use clustan graphics for all my clustering needs and gower's coefficient 
is the input i use when i have mixed variables

if you pre-process (create a dissimilarity matrix) using Gowers algorithm, 
then specify this everything should work fine

once you get this sorted, it should be all straight-forward


----- Original Message ----- 
From: "Chua Siang Li" <siang.li.chua at acceval-intl.com>
To: <r-help at r-project.org>
Sent: Wednesday, June 18, 2008 7:46 PM
Subject: [R] Cluster on both categorical and numerical data

>   Hello there.  Is there any function in R that can do cluster on a set of
>   data that has both categorical and numerical variables?  thanks.
>   siangli
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list