[R] Calculation time of isoMDS and the optimal number of dimensions

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed May 26 15:07:01 CEST 2010


On Wed, 2010-05-26 at 14:25 +0200, Joris Meys wrote:
> Hi Gavin,
> 
> thank you for the answer. I am aware of the fact that with nMDS it's
> about the configuration, and that's exactly my problem: the
> configuration changes pretty much when I increase the number of
> dimensions. As I am trying to go from a CAT(0) space of trees (see
> Billera et al on geodesic distance) to an euclidean space, the
> required amount of dimensions is not easily determined. I have to
> restrict my euclidean space for practical reasons, but I want to stay
> as close as possible to the "original" configuration of the trees.
> Hence my playing with the dimensions in the nMDS.

Hi Joris,

Yes, I see now from your reply to Michael that the 2-d fit isn't very
good hence trying later dimensions. My comment here was motivated
because you said: "Yet, I start asking myself whether this makes sense
if I'm only using the first 2 dimensions". You can't do that; fit 10d
solution but use only the first 2.

Try doing a screeplot of the stress values for k = {2,3,...,10} and see
if there is a noticeable change in slope ("elbow") where stress no
longer decreases markedly as you increase the dimension.

stressplot() will also show you the fit between original distances and
nMDS distances; you can compare the fits for k = {2,3,...,10}, again to
try to decide what configuration size to use.

Do you need nMDS at all here? You say that the configuration changes as
you increase the dimensionality, but this is just how nMDS works; lower
dimensional solutions are not nested within higher dimensional ones.

You could try to embed your distances directly into a Euclidean space
using principal coordinates analysis (PCoA, function cmdscale() in
MASS), though if your distance isn't metric you'll need to handle the
negative eigenvalues, but cmdscale has options for that. That analysis
will exactly represent your original distance in principal coordinate
space. Your job then is to choose how many PCoA axes to retain, but this
is simplified as now axis 1 never changes regardless of whether you want
to include axes 2, 2+3, 2+3+4 etc.

You could do either:

cmdscale(yourdist, k = nrow(yourdist)-1, eig = TRUE, add = TRUE)

or use 

capscale(yourdist ~ 1, add = TRUE)

and then plot the resulting eigenvalues - which up to an additive
constant (add = TRUE to get rid of negative eigenvalues) represent the
amount of information on the original distances contained within each
axis.

> I merely commented on the metaMDS as "not really meant for this kind
> of data" because of the object that's returned. As you miss the
> "species" component in the data, you get warning messages when using
> procrustes() or other functions in the vegan package. But you're
> right. It might be written for community data, but it is perfectly
> valid for any kind of distance matrix.

Warnings from metaMDS should go away if you set wascores = FALSE. You
shouldn't be getting other warnings though. Could you send me (off list)
a small example where this happens and I can look into why warnings are
being generated and stop them happening.

HTH

G

> thanks again for your insights.
> 
> Cheers
> Joris
> 
> On Wed, May 26, 2010 at 9:34 AM, Gavin Simpson
> <gavin.simpson at ucl.ac.uk> wrote:
>         On Tue, 2010-05-25 at 19:00 +0200, Joris Meys wrote:
>         > Dear all,
>         >
>         > I'm running a set of nonparametric MDS analyses, using a
>         wrapper for isoMDS,
>         > on a 800x800 distance matrix. I noticed that setting the
>         parameter k to
>         > larger numbers seriously increases the calculation time.
>         Actually, with k=10
>         > it calculates already longer than for k=2 and k=5 together.
>         It's now
>         > calculating for 6 hours, and counting...
>         
>         
>         metaMDS will try 'trymax' random starts of isoMDS in an
>         attempt to see
>         if convergent solutions are reached. The 10d computation is
>         clearly much
>         more complex than fitting rank distances in 2 or even 5 d.
>         
>         > There is quite a difference between the results using k=2 or
>         k=5 when
>         > looking at the first 2 dimensions (logically...). I suspect
>         the same when
>         > k=10. Yet, I start asking myself whether this makes sense if
>         I'm only using
>         > the first 2 dimensions. And I can't think of a formal method
>         to check in a
>         > nMDS framework how much dimensions are enough. Anybody an
>         idea?
>         
>         
>         In nMDS the configuration counts, not the axes (as they are
>         themselves
>         arbitrary directions --- having one or the other of a x or y
>         geographical coordinate isn't much use without the other
>         coordinate if
>         you want to find your way to that location - you need both).
>         It makes no
>         sense what so ever to compute a 10d nMDS solution if you only
>         want a 2d
>         solution for later computations; there is no guarantee that
>         the first
>         two "axes" of a 10d nMDS solution will be as good as those
>         from the 2d
>         solution. If you only want a 2d solution, concentrate on
>         finding the
>         best 2d solution you can using metaMDS.
>         
>         > I use metaMDS from the vegan package, although it's not
>         really meant to be
>         > used on these data.
>         
>         
>         Why do you say that? As long as you turn off a couple of the
>         "ecological" helper bits in metaMDS, all it is doing is
>         handling random
>         starts of the isoMDS algorithm.
>         
>         >
>         > Cheers
>         > Joris
>         >
>         
>         HTH
>         
>         G
>         
>         --
>         %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
>         %~%~%~%
>          Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>          ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>          Pearson Building,             [e]
>         gavin.simpsonATNOSPAMucl.ac.uk
>          Gower Street, London          [w]
>         http://www.ucl.ac.uk/~ucfagls/
>          UK. WC1E 6BT.                 [w]
>         http://www.freshwaters.org.uk
>         %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
>         %~%~%~%
>         
> 
> 
> 
> -- 
> Joris Meys
> Statistical Consultant
> 
> Ghent University
> Faculty of Bioscience Engineering 
> Department of Applied mathematics, biometrics and process control
> 
> Coupure Links 653
> B-9000 Gent
> 
> tel : +32 9 264 59 87
> Joris.Meys at Ugent.be 
> -------------------------------
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list