[R] PAM clustering: using my own dissimilarity matrix

Wolski wolski at molgen.mpg.de
Tue Jun 29 18:35:43 CEST 2004


Hi!

If your x is your symmetric matrix containing the distances than cast it to an dist object using as.dist.
?as.dist.

Sincerely
Eryk

*********** REPLY SEPARATOR  ***********

On 29.06.2004 at 18:28 Hans Körber wrote:

>Hello,
>
>I would like to use my own dissimilarity matrix in a PAM clustering with 
>method "pam" (cluster package) instead of a dissimilarity matrix created 
>by daisy.
>
>I read data from a file containing the dissimilarity values using 
>"read.csv". This creates a matrix (alternatively: an array or vector) 
>which is not accepted by "pam": A call
>
>    p<-pam(d,k=2,diss=TRUE)
>
>yields an error message "Error in pam(d, k = 2, diss = TRUE) : x is not 
>of class dissimilarity and can not be converted to this class." How can 
>I convert the matrix d into a dissimilarity matrix suitable for "pam"?
>
>I'm aware of a response by Friedrich Leisch to a similar question posed 
>by Jose Quesada (quoted below). But as I understood the answer, the 
>dissimilarity matrix there is calculated on the basis of (random) data.
>
>Thank you in advance.
>Hans
>
>__________________________________
>
>/>>>>> On Tue, 09 Jan 2001 15:42:30 -0700, /
>/>>>>> Jose Quesada (JQ) wrote: /
>
>/ > Hi, /
>/ > I'm trying to use a similarity matrix (triangular) as input for 
>pam() or /
>/ > fanny() clustering algorithms. /
>/ > The problem is that this algorithms can only accept a dissimilarity /
>/ > matrix, normally generated by daisy(). /
>
>/ > However, daisy only accept 'data matrix or dataframe. Dissimilarities /
>/ > will be computed between the rows of x'. /
>/ > Is there any way to say to that your data are already a similarity /
>/ > matrix (triangular)? /
>/ > In Kaufman and Rousseeuw's FORTRAN implementation (1990), they 
>showed an /
>/ > option like this one: /
>
>/ > "Maybe you already have correlations coefficients between variables. /
>/ > Your input data constist on a lower triangular matrix of pairwise /
>/ > correlations. You wish to calculate dissimilarities between the /
>/ > variables." /
>
>/ > But I couldn't find this alternative in the R implementation. /
>
>/ > I can not use foo <- as.dist(foo), neither daisy(foo...) because /
>/ > "Dissimilarities will be computed between the rows of x", and this is /
>/ > not /
>/ > what I mean. /
>
>/ > You can easily transform your similarities into dissimilarities like /
>/ > this (also recommended in Kaufman and Rousseeuw ,1990): /
>
>/ > foo <- (1 - abs(foo)) # where foo are similarities /
>
>/ > But then pam() will complain like this: /
>
>/ > " x is not of class dissimilarity and can not be converted to this /
>/ > class." /
>
>/ > Can anyone help me? I also appreciate any advice about other 
>clustering /
>/ > algorithms that can accept this type of input. /
>
>Hmm, I don't understand your problem, because proceeding as the docs
>describe it works for me ...
>
>If foo is a similarity matrix (with 1 meaning identical objects), then
>
>bar <- as.dist(1 - abs(foo))
>fanny(bar, ...)
>
>works for me:
>
>## create a random 12x12 similarity matrix, make it symmetric and set the
>## diagonal to 1
>/> x <- matrix(runif(144), nc=12) /
>/> x <- x+t(x) /
>/> diag(x) <- 1 /
>
>## now proceed as described in the docs
>/> y <- as.dist(1-x) /
>/> fanny(y, 3) /
>iterations objective
> 42.000000 3.303235
>Membership coefficients:
>        [,1] [,2] [,3]
>1 0.3333333 0.3333333 0.3333333
>2 0.3333333 0.3333333 0.3333333
>3 0.3333334 0.3333333 0.3333333
>4 0.3333333 0.3333333 0.3333333
>...
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list