[R] Ward clustering problem

Mike White mikewhite.diu at tiscali.co.uk
Fri Jun 4 14:36:05 CEST 2004


I have a training set of data for known classes with 5 observations of 12
variables for each class.  I want to use this information to classify new
data into classes which are known to be different to those in the training
set but each new class may contain one or more observations.  The
distribution of within class distances is expected to be similar for all
classes and this is found to be the case for the training data.  I have
tried using the maximum within class distance for the training data to set
the h variable in cutree for the clustered new data.  This appears to work
fine for "average" and "complete" clustering methods but not for the Ward
clustering method as the distance axis of the dendrogram does not directly
relate to the distances between observations.

Can anyone advise on how to optimise the h value of cutree when using the
Ward clustering method or is there a better approach to this type of
classification problem?

Thanks
Mike White




More information about the R-help mailing list