[R] Similarity matrix
jarioksa at cc.oulu.fi
Wed Apr 11 12:32:35 CEST 2001
ripley at stats.ox.ac.uk said:
> The usual way to do this is to scale similarities to [0, 1] and take D
> = sqrt(1-S) I believe, but I don't know why.
I think it depends on (i) the index used, (ii) whether you want the
similarity to be a `metric', and (iii) whether you care about ii.
For most sum-of-squares based indices D = sqrt(1-S) preserves the
`metric' or even `Euclidean' properties.
There may be a slight problem with R distance indices, since most of
them are not scaled to [0,1] originally (Canberra distance could be
defined so but I think it is not, even intentionally). In fact most of
them do not have any closed upper limit, and so it may be difficult to
find out what you mean by zero-similarity. I think the best way is to
transform the observations before calculating distances. The
standardization should be selected in accordance with the index. For
Manhattan, you should make the sum of observations to unity, and for
Euclidean, the sum of squared observations to unity. In that case
Manhattan would be scaled to [0,1] and Euclidean to [0,sqrt(2)]. For
Euclidean distance this would mean moving observations to a unit sphere
and calculating the chord distance.
Then you can have a look at distance.c function in R source and write
your own functions (like I did: I have now Sørensen/Bray-Curtis/Steinhau
s/Czekanowski (pick your favourite name), Kulczynski and Another
Canberra which all are scaled to [0,1], and at a moment even Gower
which is non-sensically scaled to keep it `metric' [more sensible
scaling would make it a `semi-metric' or not a metric, I am not sure
cheers, jari oksanen
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
Ph. +358 8 5531526, cell +358 40 5136529, fax +358 8 5531061
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help