[R] Using pam, agnes or clara as prediction models?

Renald Buter buter at cwts.leidenuniv.nl
Thu Jan 15 09:46:47 CET 2004


On Thu, Jan 15, 2004 at 08:32:45AM +0000, Prof Brian Ripley wrote:
> On Thu, 15 Jan 2004, Renald Buter wrote:
> 
> > On Wed, Jan 14, 2004 at 03:18:10PM -0500, Liaw, Andy wrote:
> > > If pam produces the cluster medoids, you should be able to use the
> > > 1-nearest-neighbor classifier for prediction of future data, using the
> > > medoids as the `training' data.  1-NN is available in the `class' package,
> > > part of the `VR' bundle.
> > > 
> > 
> > Thanks very much for your quick answer! I've tried your suggestion in
> > the following way:
> > 
> >  # separate the ruspini data into train and test set
> >  > train<-ruspini[1:50,]
> >  > test<-ruspini[51:75,]
> >  > pamx<-pam(train,4)
> >  > knnx<-knn(pamx$medoids,test,factor(c("a","b","c","d")),k=3)
> >  > knnx
> >  [1] d d b b d c b c c d c a a d c c a a c a a d c d a
> >  Levels: a b c d
> > 
> > But the result of applying the test set to the knn should only contain 2
> > clusters, since the upper half of the ruspini data contains only 2
> > clusters.
> > 
> > Could you tell me what I am missing here?
> 
> You asked that the upper half be divided into 4 clusters.  Did you look at 
> the object pamx?  It contains 4 clusters covering only the first part of 
> the dataset.

Yes, that what was I understood. My objective was to use this division
by applying it to the test set: for each point in the test set, predict
what cluster it would enter.

> Given that when you apply pam to the whole dataset there is a cluster that
> only occurs for objects 61:75, there is no way you can find that cluster
> when no member of it is in your training set.

By isn't that what the knn does: locate the nearest neighbour of a point
and assigning its (nn) label to the point-to-be-classified?

I thought that I was doing:
 1. create a clustering of data using PAM
 2. train a knn with the medoids of the PAM clustering
 3. apply the knn to the test set
 4. look at the result

Could you tell me what I'm not getting here?

Regards,

Renald




More information about the R-help mailing list