[R] BIO-ENV procedure

Fri May 14 10:20:34 CEST 2004

On Fri, 2004-05-14 at 00:08, Peter Nelson wrote:
> I've been unable to find a R package that provides the means of 
> performing Clarke & Ainsworth's BIO-ENV procedure or something 
> comparable. Briefly, they describe a method for comparing two separate 
> sample ordinations, one from species data and the second from 
> environmental data. The analysis includes selection of the 'best' 
> subset of environmental variables for explaining the observed spp 
> ordination.  Is there something available or being developed?
> 
I found a photocopy of Clarke's & Ainsworth's paper (our library does
not subscribe to the Marine Ecology Progress Series, because salty seas
are too far away). It may be that you don't have a canned "BIO-ENV"
routine in R, but you can get very close to the procedure using R (and
the only missing piece looks unessential to me). However, I know that
Bill Venables was working with Clarke's PRIMER, and he may have canned
even BIO-ENV.

The following discussion is based on Clarke & Ainsworth, Mar. Ecol.
Prog. Ser. 92, 205-219; 1993. It seems that this is not an "algorithm",
because it uses brute force and we have no idea if it converges to
anywhere. Let's call this a procedure. C&A suggest analysis with
separate ordination of community data, and then selecting a subset of
environmental variables that is similar to the species data. Please note
that this is not constrained (or `canonical') ordination: species
ordination is done  independently. Further, the similarity of
environmental and biological structure is analysed apart from
ordination, so that you may have cases with good species -- environment
relationship, but not in ordination.

1. An NMDS of community data using Bray-Curtis dissimilarities. Use
isoMDS function of Ripley's & Venables's MASS library for NMDS.
Bray-Curtis dissimilarity is available at least in vegan, and probably
in ade4 and possibly in many other packages.

2. For evaluating species -- environment relationship, they suggest
using rank correlation between Bray-Curtis dissimilarities of community
data and Euclidean distances of environmental data with certain set of
environmental variables. You have Euclidean distances in stats function
dist (or in vegan and N other packages), and you can get rank
correlations with cor.test.  However, Clarke & Ainsworth suggest a new
type of rank correalation that they call ``harmonic rank correlation''
and this may not be in R (haven't searched, though). I do think that is
unessential for the method, so you can do with the existing rank
correlations.

3. Then comes the hard work. You have to try with all possible
combinations of environmental variables (and there may be several
combinations of them). This could be canned, because this is boring and
error prone. Thomas Lumley's leaps package does this for regression
analysis, and model could be taken from there. Now you can do this, but
it may be a bit hard work.

4. Now you select the best subset, or the subset giving highest rank
correlation. There is no guarantee that there is such a unique, clear
case, but you may be lucky.

5. Take that subset, get Euclidean distances, and ordinate those
distances using NMDS, and plot your two ordinations side by side. Clarke
& Ainsworth recommend using NMDS instead of metric (or classic) MDS, and
they warn against Procrustes comparison of these two solutions (but I
would suggest Procrustes comparison). 

> library(MASS)
> library(vegan)
> data(varespec)
> data(varechem)
> env <- varechem[, c("N","P","K")]
> d <- vegdist(varespec, "bray")
> env <- scale(env)
> cor.test(d, dist(env[,1]), method="spear")$est
      rho
0.1712362
> cor.test(d, dist(env[,2]), method="spear")$est
      rho
0.1803071
> cor.test(d, dist(env[,3]), method="spear")$est
      rho
0.2427814
> cor.test(d, dist(env[,c(1,2)]), method="spear")$est
      rho
0.2422454
> cor.test(d, dist(env[,c(1,3)]), method="spear")$est
      rho
0.2471631
> cor.test(d, dist(env[,c(2,3)]), method="spear")$est
      rho
0.2081135
> cor.test(d, dist(env[,c(1,2,3)]), method="spear")$est
      rho
0.2441523

Some warnings on ties were removed. This suggest that the best subset
uses variables 2 and 3 or N and K. In this case we can skip the NMDS of
community data, since it has rank 2 (only two environmental variables),
and can be exactly plotted in 2 dim without ordination. 

> mds.comm <- isoMDS(d)
> par(mfrow=c(1,2))
> plot(mds.comm$points, asp=1)
> plot(env[, c(1,3)], asp=1)

Or, possibly:

> par(mfrow=c(1,1))
> plot(procrustes(env[,c(1,3)], mds.comm))

So you can do it. The missing pieces are the harmonic rank correlation
(if you think that's essential) and automating variable selection.
Somebody could do them (not me, though).

cheers, jari oksanen

-- 
Jari Oksanen <jarioksa at sun3.oulu.fi>