[R] isoMDS and 0 distances

Wed Apr 19 09:23:20 CEST 2006

On Tue, 2006-04-18 at 22:06 -0400, Tyler Smith wrote:

> I'm trying to do a non-metric multidimensional scaling using isoMDS. 
> However, I have some '0' distances in my data, and I'm not sure how to 
> deal with them. I'd rather not drop rows from the original data, as I am 
> comparing several datasets (morphology and molecular data) for the same 
> individuals, and it's interesting to see how much morphological 
> variation can be associated with an identical genotype.
> 
> I've tried replacing the 0's with NA, but the isoMDS appears to stop on 
> the first iteration and the stress does not improve:
> 
> distA # A dist object with 13695 elements, 4 of which == 0
> cmdsA <- cmdscale(distA, k=2)
> 
> distB <- distA
> distB[which(distB==0)] <- NA
> 
> isoA <- isoMDS(distB, cmdsA)
> initial  value 21.835691
> final  value 21.835691
> converged
> 
> The other approach I've tried is replacing the 0's with small numbers. 
> In this case isoMDS does reduce the stress values.
> 
> min(distA[which(distA>0)])
> [1] 0.02325581
> 
> distC <- distA
> distC[which(distC==0)] <- 0.001
> isoC <- isoMDS(distC)
> initial  value 21.682854
> iter   5 value 16.862093
> iter  10 value 16.451800
> final  value 16.339224
> converged
> 
> So my questions are: what am I doing wrong in the first example? Why 
> does isoMDS converge without doing anything? Is replacing the 0's with 
> small numbers an appropriate alternative?
> 
Tyler,

My experience is that isoMDS *may* fail to go away from the starting
configuration if there are identical values in initial configuration,
and this will happen if you use cmdscale() to get the initial
configuration. You *may* get over this by shifting duplicates a bit:

> con <- cmdscale(dis)
> dups <- duplicated(con)
> sum(dups)
[1] 2
> con[dups, ] <- con[dups,] + runif(2*sum(dups), -0.01, 0.01)

Then isoMDS may go further.

Another issue is that at a quick look isoMDS() seems to do nothing
sensible with missing values, although it accepts them. The only thing
is that they are ordered last, or regarded as very long distances (in
your case they rather should be regarded as very short distances). The
keylines in isoMDS are:

    ord <- order(dis)
    nd <- sum(!is.na(ord))

Even when 'dis' has missing values,  the result of order() ('ord') has
no missing values, but with default argument na.last=TRUE they are put
last in the list. An obvious looking change would be to replace the
second line with:

    nd <- sum(!is.na(dis))

but this "dumps the core" of R at least in my machine: probably you need
the full length of vectors also in addition to number of non-missing
entries. (This quick look was based on the latest release version of
MASS/VR: there may be a newer version already with the upcoming R
release, but that's not released yet.)

You may check working with NA: are duplicate points identical in
results?

Then about replacing zero distances with a tiny number: this has been
discussed before in this list, and Ripley said "no, no!". I do it all
the time, but only in secrecy. A suggested solution was to drop
duplicates, but then there still is a weighting issue, and isoMDS does
not have weights argument.

cheers, jari oksanen
-- 
Jari Oksanen -- Dept Biology, Univ Oulu, 90014 Oulu, Finland
email jari.oksanen at oulu.fi, homepage http://cc.oulu.fi/~jarioksa/