[R] About clustering techniques

Christian Hennig chrish at stats.ucl.ac.uk
Tue Jul 29 16:21:00 CEST 2008


Dear Paco,

in order to use the methods in the cluster package (including pam), look up 
the help page of daisy, which is able to compute dissimilarity matrices
handling missing values appropriately (in most situations).

A good reference is the Kaufman and Rousseeuw book cited on that help page.

Christian

On Tue, 29 Jul 2008, pacomet wrote:

> Hello R users
>
> It's some time I am playing with a dataset to do some cluster analysis. The
> data set consists of 14 columns being geographical coordinates and monthly
> temperatures in annual files
>
> latitutde - longitude - temperature 1 -..... - temperature 12
>
> I have some missing values in some cases, maybe there are 8 monthly valid
> values at some points with four non valid. I don't want to supress the whole
> row with 8 good/4 bad values as I wanna try annual and monthy analysis.
>
> I first tried kmeans but found a problem with missing values. When trying
> without omitting missing values kmeans gives an error and when excluding
> invalid data too many values are excluded in some years of the data series.
>
> Now I have been reading about pam, pamk and clara, I think they can handle
> missing values. But can't find out the way to perform the analysis with
> these functions. As I'm not an statistics nor an R expert the fpc or cluster
> package documentation is not enough for me. If you know about a website or a
> tutorial explaining the way to use that functions, with examples to check if
> possible, please post them.
>
> Any other help or suggestion is greatly appreciated.
>
> Thanks in advance
>
> Paco
>
> -- 
> _________________________
> El ponent la mou, el llevant la plou
> Usuari Linux registrat: 363952
> -------
> Fotos: http://picasaweb.google.es/pacomet
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
chrish at stats.ucl.ac.uk, www.homepages.ucl.ac.uk/~ucakche



More information about the R-help mailing list