[R] Similarity matrix

Jari Oksanen jarioksa at cc.oulu.fi
Wed Apr 11 12:32:35 CEST 2001

ripley at stats.ox.ac.uk said:
> The usual way to do this is to scale similarities to [0, 1] and take D
> = sqrt(1-S) I believe, but I don't know why. 

I think it depends on (i) the index used, (ii) whether you want the 
similarity to be a `metric', and (iii) whether you care about ii.

For most sum-of-squares based indices  D = sqrt(1-S) preserves the 
`metric' or even `Euclidean' properties.

There may be a slight problem with R distance indices, since most of 
them are not scaled to [0,1] originally (Canberra distance could be 
defined so but I think it is not, even intentionally). In fact most of 
them do not have any closed upper limit, and so it may be difficult to 
find out what you mean by zero-similarity. I think the best way is to 
transform the observations before calculating distances.  The 
standardization should be selected in accordance with the index.  For 
Manhattan, you should make the sum of observations to unity, and for 
Euclidean, the sum of squared observations to unity. In that case 
Manhattan would be scaled to [0,1] and Euclidean to [0,sqrt(2)].  For 
Euclidean distance this would mean moving observations to a unit sphere 
and calculating the chord distance.

Then you can have a look at distance.c function in R source and write 
your own functions (like I did: I have now Sørensen/Bray-Curtis/Steinhau
s/Czekanowski (pick your favourite name), Kulczynski and Another 
Canberra which all are scaled to [0,1], and at a moment even Gower 
which is non-sensically scaled to keep it `metric' [more sensible 
scaling would make it a `semi-metric' or not a metric, I am not sure 

cheers, jari oksanen
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list