[R] Cluster prediction from factor/numeric datasets

Prof Brian Ripley ripley at stats.ox.ac.uk
Mon Jul 23 23:25:51 CEST 2007


You can't do Discrimnant Analysis without a quadratic metric in a 
Euclidean space.  'Scott Bearer' explicitly does not want to assume that 
sort of distance measure.

I am not sure how he used Agnes to form 20 clusters: it forms a 
hierarchical clustering, so it really is not possible to predict from the 
results of such a clustering (you probably would not even predict the 
current cluster membership).

With a methods such as kmeans or PAM, there is a chance to predict: you 
allocate new units to the nearest cluster centre.  With PAM you can do 
this easily by computing a matrix of dissimilarities from new points to 
cluster centres and using which.min.

On Mon, 23 Jul 2007, ngottlieb at marinercapital.com wrote:

> Scott:
>
> Suggest you look at using Discrimnant Analysis (don't know which R
> package has it).
> Take the Clusters created, using Discrimnant Analysis, Get Fisher Scores
> for the clusters.

If you mean linear discriminant analysis, package MASS.  But there are 
many other classification techniques, many preferable to LDA and which 
allow non-Euclidean spaces of observations.

> Then you can take new dataset applying fisher scores to see what which
> defined cluster the new dataset
> will be classified into.
>
> Neil
>
> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Scott Bearer
> Sent: Monday, July 23, 2007 1:39 PM
> To: r-help at stat.math.ethz.ch
> Subject: [R] Cluster prediction from factor/numeric datasets
>
> Hi all,
>
> I have a dataset with numeric and factor columns of data which I
> developed a Gower Dissimilarity Matrix for (Daisy) and used
> Agglomerative Nesting
> (Agnes) to develop 20 clusters.
>
> I would like to use the 20 clusters to determine cluster membership for
> a new dataset (using predict) but cannot find a way to do this (no way
> to "predict" in the cluster package).
>
> I know I can use "predict" in cclust, kcca, and flexclust- but these
> algorithms do not permit factor data or use a Gower dissimilarity
> matrix, so are unusable to me.
>
> Any suggestions?
>
> Thanks in advance,
>
> Scott
>
> Scott Bearer, Ph.D.
> Forest Ecologist
> The Nature Conservancy
>  in Pennsylvania
> Community Arts Center
> 220 West Fourth Street, 3rd Floor
> Williamsport, PA  17701
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> --------------------------------------------------------
>
>
>
> This information is being sent at the recipient's request or...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list