[R] Fwd: MDS problems [ajtee@ajtee.uklinux.net]

Jari Oksanen jarioksa at sun3.oulu.fi
Sat Mar 27 18:07:39 CET 2004


Quoting Adam Tee <adam at ajtee.uklinux.net>:

> On 03/26/04 14:07:30, Jari Oksanen wrote:
> > The error message is clear: You have some identical sites so that
> > theirdistance is zero. I think the canonical solution is to remove  
> > the duplicate cases from the data before calculating the  
> > dissimilarities. If your data frame is called X:
> > 
> > Xuniq <- unique(X)
> > x.dist <- dist(Xuniq)
> > 
> > (or, as a one-liner: x.dist <- dist(unique(X))
> >
> 
> One of the difficulties is that it is significant that some of the data  
> is identical as I am comparing musical sequences. The above does not  
> work as I am not dealing with the raw data. I am using my own  
> dissimilarity measure output by a separate program.  The reason for  
> using MDS is to visualise the data based on the developed dissimilarity  
> measure.
> 
> Would it be better to use PCA ??

If your own dissimilarity measure really is Euclidean distance, then using
cmdscale (metric scaling) is an inefficient way of doing PCA. Otherwise it is
difficult to see how you could use PCA without access to raw data.

PCA and metric MDS (cmdscale) are adequate if you think you have a linear
mapping form musical similarities to your visual presentation. That's a brave
assumption, but a common and standard one ("canonical", would I say).

Identical observations should have identical locations in *MDS. Removing them is
just removing duplicates. In principle, Kruskal's NMDS (isoMDS) should be able
to deal with them, but Sammon scaling (sammon) would break down with zero
dissimilarities, like B.D. Ripley wrote. In a way, it is just a weighting
problem in isoMDS: if you have identical observations and you really want to
keep them, you should give larger weight to those dissimilarities in assessing
stress. I think you cannot do it now in isoMDS (I think KYST accepted them
silently). In most cases such a weighting would have no observable effect, and
you can just remove the duplicates and have equal weights.

It is not too hard to write a function that prunes identical observations from
dissimilarity matrices. You just  remove columns & rows so that no off-diagonal
zeros are left. I don't have time and patience to write up it just now (you
need as.matrix.distance, then go down the items and delete a row and the
correspoding column if you find an off-diagonal zero).

cheers, jari oksanen




More information about the R-help mailing list