# [R] kmeans and incom,plete distance matrix concern

Gabor Grothendieck ggrothendieck at gmail.com
Mon Aug 7 17:43:33 CEST 2006

```There are many clustering functions in R and R packages and some
hclust or some different clustering function.  See ?kmeans for the
kmeans function and also look at the CRAN Task View on clustering for
other clustering functions:

http://cran.r-project.org/src/contrib/Views/

On 8/7/06, Ffenics <ffenics2002 at yahoo.co.uk> wrote:
> well then i dont understand because everything i have read so far suggests that you use the dist() function to create a matrix based on the euclideam distance and then the kmeans() function.
>
> If this is incorrect, then any suggestins as to how to do this properly would be much appreciated.
>
> Christian Hennig <chrish at stats.ucl.ac.uk> wrote: First of all, kmeans doesn't work on distance matrices.
>
> On Mon, 7 Aug 2006, Ffenics wrote:
>
> > Hi there
> > I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created
> >
> > i.e:
> >
> > [
> > mat <- matrix(0, nlevels(DF\$V1), nlevels(DF\$V2),
> > dimnames = list(levels(DF\$V1), levels(DF\$V2)))
> > mat[cbind(DF\$V1, DF\$V2)] <- DF\$V3
> > This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.
> >
> > My query is this: not all the data for the initial matrix (x) exists and therefore the matrix is not fully populated - empty cells are populated with '0's.
> >
> > Could someone please tell me how this may affect the result from the dist() command - because a '0' in a distance matrix means that the two variables are identical doesnt it(?) - but I dont want tthings clustered together simply because there was no information.
> >
> > Is this a problem and are there ways to circumnavigate them? Thanks
> >
> >  [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> *** --- ***
> Christian Hennig
> University College London, Department of Statistical Science
> Gower St., London WC1E 6BT, phone +44 207 679 1698
> chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help