[R] kmeans and incom,plete distance matrix concern

Christian Hennig chrish at stats.ucl.ac.uk
Mon Aug 7 16:46:40 CEST 2006


First of all, kmeans doesn't work on distance matrices.

On Mon, 7 Aug 2006, Ffenics wrote:

> Hi there
> I have been using R to perform kmeans on a dataset. The data is fed in using read.table and then a matrix (x) is created
>
> i.e:
>
> [
> mat <- matrix(0, nlevels(DF$V1), nlevels(DF$V2),
> dimnames = list(levels(DF$V1), levels(DF$V2)))
> mat[cbind(DF$V1, DF$V2)] <- DF$V3
> This matrix is then taken and a distance matrix (y) created using dist() before performing the kmeans clustering.
>
> My query is this: not all the data for the initial matrix (x) exists and therefore the matrix is not fully populated - empty cells are populated with '0's.
>
> Could someone please tell me how this may affect the result from the dist() command - because a '0' in a distance matrix means that the two variables are identical doesnt it(?) - but I dont want tthings clustered together simply because there was no information.
>
> Is this a problem and are there ways to circumnavigate them? Thanks
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list