[R] BIOENV

Mon May 6 10:38:08 CEST 2013

Hello Gilson,

On 06/05/2013, at 05:34 AM, Gilson Carvalho wrote:

> Dear all,
> 
> Does anyone knows why the results of a BIOENV (PRIMER v. 6.1.15) are diferent of the bioenv() + mantel() in vegan? Not the spearman correlation, indeed the pseudo-p value.
> 
> I know that the approach bioenv() + mantel() is biased. So, how the BIOENV (PRIMER) ends with larger p values (permutated).
> 

I cannot give a firm answer, because I know only half of the problem: I have never used PRIMER nor seen any version of its manual, and I can only answer for vegan. 

I interpret your message so that PRIMER has a permutation test for BIOENV. Vegan has no such test so that these two cannot be compared. It appears  that you tried bioenv() + mantel() in vegan, and you said that you know it is biased. It certainly is biased, and therefore I cannot understand why you are surprised after getting biased results. I don't know PRIMER, but chances are that it does things correctly and gives unbiased results. In that case you should get exactly the kind of bias you observed: too low (significant) P-values in bioenv() + mantel() in vegan. (There are some technical points that you must take care, too -- more at the bottom of the message.)

The bias in mantel() + bioenv() arises because you select variables in bioenv() to maximize the correlation, and then you treat these selected variables as they were a priori (not selected) in mantel(). The selection procedure must be a part of the process of assessing the significance. It may be so in PRIMER, but I don't know.

I tested this in vegan using varespec and varechem data. Here bioenv() selected a five-variable model (N P Al Mn Baresoil) which I then fed into mantel(). In addition, I made a permutation test for bioenv(): I permuted data, repeated bioenv to select the best set of variables for this permutation, and collected the max correlation from each run. In mantel with fixed pre-selected set of variables the fivenum summaries for 999 permutations were Min = -0.267, 1st Qu = -0.053, Median = 0.000 , 3rd Qu =  0.052, Max = 0.278. With exactly the same permutations, bioenv gave five num summaries -0.006 (min), 0.129, 0.179 (Md), 0.225, 0.423 (max), or nearly 0.2 units higher. We selected the variables to maximize the correlations in bioenv() and therefore the values were much higher (Median 0.18 in bioenv pro 0.00 in Mantel). Consequently the P-values can be much higher (less significant) in correctly performed bioenv() test.

BTW, if you do this test with bioenv, I really hope you have a PC with multicore CPU. I used parallel processing with eight cores and it still was really slow (felt like 30 min although I didn't check the timing). 

This was the bias and how it works, and this alone is sufficient to explain large  differences. 

Then some technical details -- you must be careful in comparing the models and building your working sequences:

(1) bioenv paper (Clarke & Ainsworth, Mar Ecol Prog Ser 92, 205-219; 1993) also introduces a new correlation-like measure that we have not implemented in vegan::bioenv(). You must be careful to use same correlations in both tests.

(2) vegan::bioenv() defaults to use Spearman correlation, but vegan::mantel() defaults to Pearson. You must be careful to use the same in both.

(3) vegan::mantel() only uses one-sided tests, whereas some implementations use two-sided tests. This cannot be changed in vegan, but you must be careful here. The P-values can be higher (less significant) with two-sided tests.

> Acctualy how the permutation test in BIOENV (PRIMER) is conducted. The user guide does not make it clear.
> 

I think it is best to look at the source code to see how things are really done. This is easy in vegan which is open-source. I don't know about PRIMER.

Cheers, Jari Oksanen
-- 
Jari Oksanen, Dept Biology, Univ Oulu, 90014 Finland