[R] algorithm for clustering categorical data

David Carlson dcarlson at tamu.edu
Thu Aug 1 18:08:08 CEST 2013


Read up on Gower's Distance measures (available in the ecodist
package) which can combine numeric and categorical data. You
didn't give us any information about how you numerically
transformed the categorical variables, but the usual approach
is to create indicator variables that code presence/absence
for each category within a categorical variable. Different
variances between variables can be reduced by standardizing
the variables.

-------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77840-4352

-----Original Message-----
From: r-help-bounces at r-project.org
[mailto:r-help-bounces at r-project.org] On Behalf Of Li, Yan
Sent: Thursday, August 1, 2013 11:00 AM
To: r-help at r-project.org
Subject: [R] algorithm for clustering categorical data

Hi All,

Does anyone know what algorithm for clustering categorical
variables? R
packages? Which is the best?

If a data has both numeric and categorical data, what is the
best clustering algorithm
to use and R package?

I tried numeric transformation of all categorical fields  and
doing clustering afterwards. But the transformed fields have
values from 1...10, and my other fields is in a bigger scale:
10000-...This will make the categorical fields has less effect
on the distance calculation...

Thank you!
Yan

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.



More information about the R-help mailing list