[R] Multidimensional scaling and distance matrices

Prof Brian Ripley ripley at stats.ox.ac.uk
Thu Feb 26 15:10:07 CET 2004


A few comments:

MDS is normally done on a dissimilarity matrix, not necessarily a distance 
matrix (no need for the triangle inequality to be enforced).

Some MDS software will autmatically map similarity matrices to
corresponding dissimilarity matrices if told to do so (but not all by the
same mapping, usually D = 1-S or D = sqrt(1-S)).  It looks like a
`kinship' matrix is a cousin of a similarity matrix, which usually have
entries between 0 and 1 and with 1 on the diagonal.

The description of MDS in Statistica at

http://www.statsoftinc.com/textbook/stmulsca.html

is entirely in terms of `observed distances', and Kruskal-type MDS.

Note that non-metric MDS is almost impossible to reproduce due to local 
minima, although hopefully one could get a similar solution in a different 
implementation of the same method.

Faced with your example, I would treat it as a covariance matrix, turn it 
into a correlation matrix and take the distances as 1 - correlations, and 
cross my fingers.

On 26 Feb 2004, Federico Calboli wrote:

> Dear All,
> 
> I am in the somewhat unfortunate position of having to reproduce the
> results previously obtained from (non-metric?) MDS on a "kinship" matrix
> using Statistica. A kinship matrix measures affinity between groups, and
> has its maximum values on the diagonal. 
> 
> Apparently, starting with a nxn kinship matrix, all it was needed to do
> was to feed it to Statistica flagging that the matrix was NOT a distance
> matrix but a kinship one. If Statistica transformed the kinship matrix
> into a distance one (how?) is anybody's guess. 
> 
> A quick search immediately showed that a multidimensional scaling is
> done on a distance matrix. See for instance:
> MASS4, pg 304
> "Elements of computational statistics", Jentle, pg 122
> Edwards and Oman's article, page 2-7 R-News 3/3 
> 
> The fact that Statistica happily perform MDS on a "kinship" matrix is
> puzzling. Indeed, I would expect errors, as in the following toy
> example, without transforming the kinship matrix to distances:
> 
> > test
>            V1          V2          V3          V4          V5
> 1 0.198716340 0.003612042 0.011926851 0.019737349 0.015021053
> 2 0.003612042 0.066742885 0.013809924 0.005121996 0.011175845
> 3 0.011926851 0.013809924 0.197337389 0.013893087 0.006405424
> 4 0.019737349 0.005121996 0.013893087 0.216047450 0.006218477
> 5 0.015021053 0.011175845 0.006405424 0.006218477 0.118812936
> 
> cmdscale(test)
>    [,1] [,2]
> V1  NaN  NaN
> V2  NaN  NaN
> V3  NaN  NaN
> V4  NaN  NaN
> V5  NaN  NaN
> Warning messages:
> 1: some of the first 2 eigenvalues are < 0 in: cmdscale(test)
> 2: NaNs produced in: sqrt(ev)
> > isoMDS(test)
> Error in isoMDS(test) : NAs/Infs not allowed in d
> > sammon(test)
> Error in sammon(test) : initial configuration must be complete
> In addition: Warning messages:
> 1: some of the first 2 eigenvalues are < 0 in: cmdscale(d, k)
> 2: NaNs produced in: sqrt(ev)
> 
> 
> The colleagues who used the above routine are unable to tell me with
> certainty whether Statistica used metric/non metric scaling, and if non
> metric whether a Kruskall or a Sammon scaling. 
> 
> In any case, I would simply like to ask the memebers of the list if I am
> correct in thinking that MDS can ONLY be performed on a distance matrix,
> and I can therefore reasonably expect that some form of transformation
> to a distance matrix has been performed by Statistica prior to the MDS.
> It would at least be a first step to understand what exactly Statistica
> did with the data.
> 
> Regards,
> 
> Federico Calboli
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list