[R] cluster analysis and supervised classification: an alternative to knn1?

Ulrich Bodenhofer bodenhofer at bioinf.jku.at
Thu May 27 12:46:08 CEST 2010


abanero wrote:
>
> Do you know  something like “knn1” that works with categorical variables
> too?
> Do you have any suggestion? 
>
There are surely plenty of clustering algorithms around that do not require
a vector space structure on the inputs (like KNN does). I think
agglomerative clustering would solve the problem as well as a kernel-based
clustering (assuming that you have a way to positive semi-definite measure
of the similarity of two samples). Probably the simplest way is Affinity
Propagation (http://www.psi.toronto.edu/index.php?q=affinity%20propagation;
see CRAN package "apcluster" I have co-developed). All you need is a way of
measuring the similarity of samples which is straightforward both for
numerical and categorical variables - as well as for mixtures of both (the
choice of the similarity measures and how to aggregate the different
variables is left to you, of course). Your final "classification" task can
be accomplished simply by assigning the new sample to the cluster whose
exemplar is most similar.

Joris Meys wrote:
>
> Not a direct answer, but from your description it looks like you are
> better
> of with supervised classification algorithms instead of unsupervised
> clustering. 
>
If you say that this is a purely supervised task that can be solved without
clustering, I disagree. abanero does not mention any class labels. So it
seems to me that it is indeed necessary to do unsupervised clustering first.
However, I agree that the second task of assigning new samples to
clusters/classes/whatever can also be solved by almost any supervised
technique if samples are labeled according to their cluster membership
first.

Cheers, Ulrich
-- 
View this message in context: http://r.789695.n4.nabble.com/cluster-analysis-and-supervised-classification-an-alternative-to-knn1-tp2231656p2232902.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list