[Rd] cmdscale problem

Christian Kleiber kleiber at statistik.uni-dortmund.de
Sat Jan 8 16:00:28 CET 2005


Dear R developers,

there appears to be a small problem with function cmdscale: for 
non-Euclidean distance matrices, using option add=FALSE (the default), 
cmdscale misses the smallest eigenvalue. This affects GOF statistic g.1 
(See Mardia, Kent + Bibby (1979): Multivariate Analysis, eq. (14.4.7). 
The corresponding formula in Cox + Cox (2001): Multidimensional Scaling, 
2nd ed., p 38, would seem to contain a misprint, it should be n instead 
of n-1.)

Example:

R> cmdscale(eurodist, eig=TRUE)$GOF
[1] 0.7968344 0.8679134

The full set of eigenvalues can be obtained via

R> H <- function(n) { diag(n) - matrix(1, n, n)/n }
R> eigen( -.5 * H(21) %*% (as.matrix(eurodist))^2 %*% H(21) ,
+ only.values=TRUE)$values
  [1]  1.953838e+07  1.185656e+07  1.528844e+06
  [4]  1.118742e+06  7.893472e+05  5.816552e+05
  [7]  2.623192e+05  1.925976e+05  1.450845e+05
[10]  1.079673e+05  5.139484e+04  3.538963e-10
[13] -9.496124e+03 -5.305820e+04 -1.322166e+05
[16] -2.573360e+05 -3.326719e+05 -5.162523e+05
[19] -9.191491e+05 -1.006504e+06 -2.251844e+06


whereas cmdscale allows access of the first 20 of these via


R> cmdscale(eurodist, k=20, eig=TRUE)$eig
  [1]  1.953838e+07  1.185656e+07  1.528844e+06
  [4]  1.118742e+06  7.893472e+05  5.816552e+05
  [7]  2.623192e+05  1.925976e+05  1.450845e+05
[10]  1.079673e+05  5.139484e+04  4.656613e-10
[13] -9.496124e+03 -5.305820e+04 -1.322166e+05
[16] -2.573360e+05 -3.326719e+05 -5.162523e+05
[19] -9.191491e+05 -1.006504e+06
Warning messages:
...


Suggestion: replace

     evalus <- e$values[-n]

with

     evalus <- e$values

This yields

R> cmdscale(eurodist, eig=TRUE)$GOF
[1] 0.7537543 0.8679134


Minor points:

May I suggest to allow access of all eigenvalues, instead of just the 
first k as delivered by cmdscale(... , eig=TRUE)$eig, in order to permit 
inspection of the severity of the problem? (I am aware that a comparison 
of g.1 and g.2 indicates the presence of negative eigenvalues.) It might 
also be useful to deliver a warning whenever some eigenvalues are < 0, 
using something like

if (any(e$values < 0))
       warning("some eigenvalues are < 0")

Also, the documentation says that cmdscale(...)$eig returns "the n-1 
eigenvalues computed during the scaling process if 'eig' is true", 
whereas in fact it returns the first k.

Incidentally, there is an undocumented feature: depending on the value 
of if(eig || x.ret || add) the value is a list including a component 
'ac' (add. constant) in addition to those mentioned in the documentation.

Best regards,
Christian Kleiber


Version: R 2.0.1
OS: Win XP Pro


-- 

*****************************************

Dr. Christian Kleiber
FB Statistik
Universitaet Dortmund
Vogelpothsweg 78
D-44227 DORTMUND
Germany

Tel: ++49-(0)231-755 5419
Fax: ++49-(0)231-755 5284
E-mail: kleiber at statistik.uni-dortmund.de



More information about the R-devel mailing list