[R] Document clustering for R

Christian Hennig chrish at stats.ucl.ac.uk
Tue Sep 13 15:50:08 CEST 2005


On Tue, 13 Sep 2005, Jari Oksanen wrote:

> On Mon, 2005-09-12 at 12:47 -0700, Raymond K Pon wrote:
> > I'm working on a project related to document clustering. I know that R
> > has clustering algorithms such as clara, but only supports two distance
> > metrics: euclidian and manhattan, which are not very useful for
> > clustering documents. I was wondering how easy it would be to extend the
> > clustering package in R to support other distance metrics, such as
> > cosine distance, or if there was an API for custom distance metrics.
> >
> You don't have to extend the "clustering package in R to support other
> distance metrics", but you should take care that you produce your
> dissimilarities (or distances) in the standard format so that they can
> be used in "clustering package" or in cmdscale or in isoMDS or any other
> function excepting a "dist" object.  "Clustering package" will support
> new dissimilarities if they were written in standard conforming way.
> There are several packages that offer alternative dissimilarities (and
> some even distances) that can be used in clustering functions. Look for
> "distances" or "dissimilarities" in the R Site. Some of these could be
> the one for you... I would be surprised if cosine index is missing (and
> if needed, I could write it for you in C, but I don't think that is
> necessary).

Generation of the standard dist format out of a distance
matrix m works simply by as.dist(m).

Christian


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche




More information about the R-help mailing list