[R] Multidimensional scaling and distance matrices

Thu Feb 26 16:08:23 CET 2004

On Thu, 2004-02-26 at 17:05, Federico Calboli wrote:
> On Thu, 2004-02-26 at 12:35, Christian Hennig wrote:

> I am happy with the function "dist" in {mva}, and I know there are other
> functions in {cluster}, but it's besides the point. The question that is
> nagging me is: is it justified to do a form of MDS on a matrix other
> than a distance matrix? the reference I pointed out to do say to use a
> distance matrix, but do not explicitely say "all else is wrong", so I
> could call it a day.
> 
No. You can write a program for NMDS that accepts either similarities or
dissimilarities. This was an option already in KYST (Kruskal - Young -
Shepard - Torgeson) programme that was one of the first available pieces
of software for running NMDS (from early 1970s or late 1960s).
Technically this means that you have either monotone decrease of
monotone increase in your Shepard plot. It doesn't matter. Of course,
you do not need that option, since you can change your similarities into
dissimilarities. Typically this is easy, and you can do something like 1
- similarity or 1 - sqrt(similarity) (the latter is metric for some
cases where the former is semimetric). This translations is trickier in
case like yours where the diagonals vary. However, I guess that
Statistica (or KYST) would not look at the diagonals: if the translation
into dissimilarities is tricky, the handling of similarities is probably
wrong in the software. That is, the software uses only off-diagonal
values at their face value, implying it thinks the diagonal values are
all equal. So it is better that you have to change your similarities
into dissimilarities since then you know how to do that -- NMDS may not
know or even care.

Actually, I think you could port KYST to R if you want to get a function
that accepts similarities, too. However, I think this is not worthwhile,
but Ripley's isoMDS (in MASS) is a better alternative. (It might be
worthwhile, and even easier, to port SINDSCAL, but I am not sure if its
licence allows this.)

Finally, Sammon and Kruskal scaling do not exhaust the alternatives for
NMDS. Actually, I think that the method used in Statistica may be ALSCAL
(many years since I used Statistica), like in the software used as a
model of Statistica. I am pretty sure it wasn't Sammon.

cheers, jari oksanen 
-- 
J.Oksanen, Oulu, Finland.
"Object-oriented programming is an exceptionally bad idea which could
only have originated in California." E. Dijkstra