[R] efficient code - yet another question

Mon Sep 29 19:12:04 CEST 2008

You should also have a look at the pcaMethods package (on
Bioconductor).  I did some code optimization for the NIPALS algorithm
in that package which sped up the algorithm by at least a factor of
two.

Kevin Wright

On Wed, Apr 30, 2008 at 10:56 PM, steven wilson <swpt07 at gmail.com> wrote:
> Dear list members;
>
> The code given below corresponds to the PCA-NIPALS (principal
> component analysis) algorithm adapted from the nipals function in the
> package chemometrics. The reason for using NIPALS instead of SVD is
> the ability of this algorithm to handle missing values, but that's a
> different story. I've been trying to find a way to improve (if
> possible) the efficiency of the code, especially when the number of
> components to calculate is higher than 100. I've been working with a
> data set of 500 rows x 700 variables. The code gets really slow when
> the number of PC to calculate is for example 600 (why that number of
> components?....because I need them to detect outliers using another
> algorithm). In my old laptop running Win XP and R 2.7.0 (1GB RAM) it
> takes around 6 or 7 minutes. That shouldn't be a problem for one
> analysis, but when cross-validation is added the time increases
> severely.....Although there are several examples on the R help list
> giving some with 'hints' to improve effciency the truth is that I
> don't know (or I can't see it) the part of the code that can be
> improved (but it is clear that the bottle neck is the for and while
> loops). I would really appreciate any ideas/comments/etc...
>
> Thanks
>
> #################################################################
>
> pca.nipals <- function(X, ncomp, iter = 50, toler = sqrt(.Machine$double.eps))
> # X...data matrix, ncomp...number of components,
> # iter...maximal number of iterations per component,
> # toler...precision tolerance for calculation of components
> {
>
>     Xh <- scale(X, center = TRUE, scale = FALSE)
>     nr <- 0
>     T <- NULL; P <- NULL # matrix of scores and loadings
>     for (h in 1:ncomp)
>              {
>                     th <- Xh[, 1]
>                     ende <- FALSE
>                     while (!ende)
>                       {
>                           nr <- nr + 1
>                           ph <- t(crossprod(th, Xh) * as.vector(1 /
> crossprod(th)))
>                           ph <- ph * as.vector(1/sqrt(crossprod(ph)))
>                           thnew <- t(tcrossprod(t(ph), Xh) *
> as.vector(1/(crossprod(ph))))
>                           prec <- crossprod(th-thnew)
>                           th <- thnew
>                           if (prec <= (tol^2)) ende <- TRUE
>                           if (it <= nr) ende <- TRUE # didn't converge
>                       }
>
>                     Xh <- Xh - tcrossprod(th)
>                     T <- cbind(T, th); P <- cbind(P, ph)
>                     nr <- 0
>               }
>     list(T = T, P = P)
> }
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>