[R] "predict"-fuction for metaMDS (vegan)

Gavin Simpson gavin.simpson at ucl.ac.uk
Wed Sep 9 15:47:05 CEST 2009


On Wed, 2009-09-09 at 14:43 +0200, Kim Vanselow wrote:
> Dear r-Community,
> Step1: I would like to calculate a NMDS (package vegan, function
> metaMDS) with species data.
> Step2: Then I want to plot environmental variables over it, using
> function envfit.
> The Problem: One of these environmental variables is cos(EXPOSURE).
> But for flat releves there is no exposure. The value is missing and I
> can't call it 0 as 0 stands for east and west. Therefore I kicked all
> releves with missing environmental variables. Both, metaMDS and envfit
> then work without problems.
> Now I want to bring the releves with missing environmetal variables
> back to my ordination-plot.
> 
> Gavin Simpson gave me the advice to use the predict-function for the
> same missing value problem when I was calculating a cca. This worked
> without problem.

Hi Kim,

Apologies for not replying (yet) to your private email to me (asking
essentially the same as here) --- it was a busy, late, working day for
me yesterday.

Also note that Jari has commented on how you are coding your Exposure
variable; I glossed over that bit when providing my response and you
should probably rethink your coding along the lines Jari suggested.

There isn't a predict function for metaMDS() because there aren't rules
or relationships that would allow you to do what predict.cca does but
for a nMDS. In the CCA case we were estimating the locations in
ordination space for the left-out samples on the basis of their species
composition and computing their site score as a weighted average of the
species scores determined from the data that went into building the CCA.

In nMDS all you have is the species information (at first) which is
converted to dissimilarities and thence to nMDS configurations. In this
sense, none of the information is missing (in your case), so you don't
need to leave out the samples with missing exposures. So far so good.
(Note the point about rethinking your coding, above).

What you now want to do is superimpose upon that plot the env data where
you might have missingness. envfit doesn't allow missings and it is not
immediately clear to me how it might be modified to do so, certainly not
without a lot of changes.

Instead, ordisurf() may be more useful, but you will loose the nice
"plot all vectors on the ordination at once" feature of envfit (you gain
a lot with ordisurf though as there is no reason to assume anything is
linear across an nMDS configuration).

A cursory look at the guts of ordisurf indicates that it can handle
missing values in the (env) variable you wish to overlay onto the nMDS
plot; the data is passed to mgcv::gam usig the formula interface which
deals with the NA.

If obj contains your nMDS model and Y is a matrix of env variables
containing Exposure, then something like this (untested)

ordisurf(obj, Y$Exposure)

will plot the ordination and surface in a single step. You might want to
control things a bit more so take a look at ?ordisurf.

> 
> As my species data was recorded in Braun-Blanquet-numbers (ordinal
> scale) I would prefer to calculate a NMDS. Does anybody know a similar
> function to the predict function which works with NMDS or does anybody
> know how to modify the predict function so that it will work also for
> NMDS?

You could also try capscale() also in vegan. This is like CCA and RDA
but allows you to use any dissimilarity coefficient. It is a bit like a
constrained Principal Coordinates Analysis. This can use the rda method
for predict to do what you did with the CCA earlier.

HTH

G

> 
> Thank you very much!
> 
> Kim
>  
> -------- Original-Nachricht --------
> > Datum: Fri, 04 Sep 2009 18:11:09 +0100
> > Von: Gavin Simpson <gavin.simpson at ucl.ac.uk>
> > An: Kim Vanselow <Vanselow at gmx.de>
> > CC: r-help at r-project.org
> > Betreff: Re: [R] NA in cca (vegan)
> 
> > On Fri, 2009-09-04 at 17:15 +0200, Kim Vanselow wrote:
> > > Dear all,
> > > I would like to calculate a cca (package vegan) with species and
> > > environmental data. One of these environmental variables is
> > > cos(EXPOSURE).
> > > The problem: for flat releves there is no exposure. The value is
> > > missing and I can't call it 0 as 0 stands for east and west.
> > > The cca does not run with missing values. What can I do to make vegan
> > > cca ignoring these missing values?
> > > Thanks a lot,
> > > Kim
> >
> > Hi Kim,
> >
> > This is timely as Jari Oksanen (lead developer on vegan) has been
> > looking into making this happen automatically in vegan ordination
> > functions. The solution for something like cca is very simple but it
> > gets more complicated when you might like to allow features like
> > na.exclude etc and have all the functions that operate on objects of
> > class "cca" work nicely.
> >
> > For the moment, you should just process your data before it goes into
> > cca. Here I assume that you have two data frames; i) Y is the species
> > data, and ii) X the environmental data. Further I assume that only one
> > variable in X has missings, lets call this Exposure:
> >
> > ## dummy data
> > set.seed(1234)
> > ## 20 samples of 10 species
> > Y <- data.frame(matrix(rpois(20*10, 2), ncol = 10))
> > ## 20 samples and 5 env variables
> > X <- data.frame(matrix(rnorm(20*5), ncol = 5))
> > names(X) <- c(paste("Var", 1:4, sep = ""), "Exposure")
> > ## simulate some NAs in Exposure
> > X$Exposure[sample(1:20, 3)] <- NA
> > ## show X
> > X
> >
> > ## Now create a new variable indicating which are missing
> > miss <- with(X, is.na(Exposure))
> >
> > ## now create new X and Y omitting these rows
> > Y2 <- Y[!miss, ]
> > X2 <- X[!miss, ]
> >
> > ## Now submit to CCA
> > mod <- cca(Y2 ~ ., data = X2)
> > mod
> >
> > ## plot it
> > plot(mod, display = c("sites","bp"), scaling = 3)
> >
> > ## It'd be nice to get predictions for the 3 samples we missed out
> > pred <- predict(mod, newdata = Y[miss, ], type = "wa", scaling = 3)
> >
> > ## add these points to the ordination:
> > points(pred[, 1:2], col = "red", cex = 1.5)
> >
> > HTH
> >
> > G
> > --
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
> >  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
> >  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
> >  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
> >  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> > %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%




More information about the R-help mailing list