[R] SAMPLS R implementation : pbm with algorithm application

Mon Sep 11 19:25:24 CEST 2000

Hello R people,

i'm trying to implement the Partial Least Squares algorithm called
SAMPLS from "J.Comp-Aided Molecular Design", 7 (1993), 587-619. It's
faster than the classical PLS algorithm for fat matrix (m>>n).

Here's the algorithm from the article of Bush B. L. and Nachbar R.B.:
    X is the matrix of explanatories proprieties (m*n) , y the matrix of
responses, h the number of latent variables extracted
    XT is for X matrix transposed
    x* is for the quantities for one sample (y* is the response
predicted from the model derived; i used one to test my R traduction
compared to the R pls module )

   Calculate the covariance matrix C=XXT    and     c*=Xx* for
prediction
     y is centered and become y1
    y*1=0

    For h =1,2,3...hmax
        s=Cyh
        center s
        working scalar for prediction sample s*=c*Tyh
        orthogonalize s to previous t: for g=1,...(h-1),
s=s-(tgTs/tgTtg)tg
        orthogonalize s* to previous t*: for g=1,...(h-1),
s*=s*-(tgTs/tgTtg)t*g

            t*h=s*
        th=s
        th2=tTt
        betah=(tTyh)/th2

        update yh+1=yh-betahth
           buid up prediction y*h+1=y*h+betaht*h

    end of cycle
----------------------------------- R-code
##xe and  ye  are the explanatories and responses matrices, xtest and
ytestsampls the variables for 1 sample

x2<-scale(xe,scale=FALSE)
y2<-scale(ye,scale=FALSE)

lv<-1
xtest<-as.matrix(x2[1,])
t<-matrix(0,nrow(ye),1)
c<-xe%*%t(xe)
yh<-y2
ytestsampls<-0
ctest<-xe%*%xtest

for (h in 1:lv) {
 s<-c%*%yh
 s<-scale(s,scale=FALSE)
stest<-t(ctest)%*%yh

##what follows works only for h=1 and 2, i know

 if (h>1) { s<-s- ( as.numeric( (t(t)%*%s)  / (t(t)%*%t) ) *t )
  stest<-stest-( as.numeric( (t(t)%*%s)  / (t(t)%*%t) ) *ttest )
  }
ttest<-stest
 t<-s
 t2<-t(t)%*%t
 beta<-t(t)%*%yh
 beta<-as.numeric(beta/t2)

ytestsampls<-ytestsampls + as.numeric(beta)*(ttest)
 yh<-yh-(beta*t)
}

ytestsampls2<-ytestsampls+mean(ye)

-------------------

When lv (number of variables extracted ) is 1 , no problem the y
predicted (ytestsampls2) is the same as when using the R module pls
(library(pls)). But when using lv=2, there is a difference , thus an
error in my code that must come from the update steps.

Does it come from the original algorithm or from my traduction.

Merci d'avance,

sorry for the size of this e-mail and thanks for reading it till all,

--
Nicolas Baurin

Doctorant
Institut de Chimie Organique et Analytique, UPRES-A 6005
Université d'Orléans, BP 6759
45067 ORLEANS Cedex 2, France
Tel: (33+) 2 38 49 45 77

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://stat.ethz.ch/pipermail/r-help/attachments/20000911/dca546d4/attachment.html