[R] bootstrapped eigenvector method following prcomp

Axel Strauß a.strauss at tu-bs.de
Mon Jan 19 09:53:03 CET 2009


G'Day R users!

Following an ordination using prcomp, I'd like to test which variables 
singnificantly contribute to a principal component. There is a method 
suggested by Peres-Neto and al. 2003. Ecology 84:2347-2363  called 
"bootstrapped eigenvector". It was asked for that in this forum in 
January 2005 by Jérôme Lemaître:
"1) Resample 1000 times with replacement entire raws from the original 
data sets []
2) Conduct a PCA on each bootstrapped sample
3) To prevent axis reflexion and/or axis reordering in the bootstrap, 
here are two more steps for each bootstrapped sample
3a) calculate correlation matrix between the PCA scores of the original 
and those of the bootstrapped sample
3b) Examine whether the highest absolute correlation is between the 
corresponding axis for the original and bootstrapped samples. When it is 
not the case, reorder the eigenvectors. This means that if the highest 
correlation is between the first original axis and the second 
bootstrapped axis, the loadings for the second bootstrapped axis and use 
to estimate the confidence interval for the original first PC axis.
4) Determine the p value for each loading. Obtained as follow: number of 
loadings >=0 for loadings that were positive in the original matrix 
divided by the number of boostrap samples (1000) and/or number of 
loadings =<0 for loadings that were negative in the original matrix 
divided by the number of boostrap samples (1000)."

(see https://stat.ethz.ch/pipermail/r-help/2005-January/065139.html ).

The suggested solution (by Jari Oksanen) was


function (x, permutations=1000, ...)
{
    pcnull <- princomp(x, ...)
    res <- pcnull$loadings
    out <- matrix(0, nrow=nrow(res), ncol=ncol(res))
    N <- nrow(x)
    for (i in 1:permutations) {
        pc <- princomp(x[sample(N, replace=TRUE), ], ...)
        pred <- predict(pc, newdata = x)
        r <-  cor(pcnull$scores, pred)
        k <- apply(abs(r), 2, which.max)
        reve <- sign(diag(r[k,]))
        sol <- pc$loadings[ ,k]
        sol <- sweep(sol, 2, reve, "*")
        out <- out + ifelse(res > 0, sol <=  0, sol >= 0)
    }
    out/permutations
}

However, in a post from March 2005 ( http://r-help.com/msg/6125.html ) 
Jari himself mentioned that there is a bug in this method.

I was wondering whether someone could tell me where the bug is or 
whether there is a better method in R to test for significance of 
loadings (not the significance of the PCs).
Maybe it is not a good idea to do it at all, but I would prefer to have 
some guidline for interpretation rather than making decisions 
arbitrarily. I tried to look everywhere before posting here.

I would be very thankful for any help,

Axel

-- 
Gravity is a habit that is hard to shake off.
Terry Pratchett




More information about the R-help mailing list