[R] scores for a new observation from PCAgrid() in pcaPP

Kari Ruohonen kari.ruohonen at utu.fi
Fri Oct 15 14:30:28 CEST 2010


Hi,
I a trying to compute scores for a new observation based on previously
computed PCA by PCAgrid() function in the pcaPP package. My data has
more variables than observations.

Here is an imaginary data set to show the case:
> n.samples<-30
> n.bins<-1000
> x.sim<-rep(0,n.bins)
> V.sim<-diag(n.bins)
> mtx<-array(dim=c(n.samples,n.bins))
> for(i in 1:n.samples) mtx[i,]<-mvrnorm(1,x.sim,V.sim)

With prcomp() I can do the following:

> pc.pr2<-prcomp(mtx,scale=TRUE)
> newscr.pr2<-scale(t(mtx[1,]),pc.pr2$center,pc.pr2$scale)%*%pc.pr2
$rotation

The latter computes the scores for the first row of mtx. I can verify
that the scores are the same as computed originally by comparing with

> pc.pr2$x[1,] # that will print out the scores for the first
observation

Now, if I tried the same with PCAgrid() as follows:

> pc.pp2<-PCAgrid(mtx,k=min(dim(mtx)),scale=mad)
> newscr.pp2<-scale(t(mtx[1,]),pc.pp2$center,pc.pp2$scale)%*%pc.pp2
$loadings

The newscr.pp2 do not match the scores in the pc.pp2 object as can be
verified by comparing with:
> pc.pp2$x[1,] 

I wonder what I am missing? Or is it so that for the grid method such
computation of scores from the loadings and original observations is not
possible?

For the case p<n, i.e. when there are more observations than variables,
the scores computed from loadings and the scores from the model object
match also for the PCAgrid() method, i.e. the behaviour described above
seems to relate to cases where p>n.

Many thanks for any help,
Kari



More information about the R-help mailing list