[R] Scaling in predict.prcomp

Mon Apr 21 02:27:32 CEST 2008

Prof Brian Ripley wrote:
> On Sun, 20 Apr 2008, Gad Abraham wrote:
> 
>> Hi,
>>
>> Say x.train is a matrix of covariates that I want to do PCA on, so I can
>> do regression on its principal components, and x.test is a test set of
>> the same covariates on which I want to evaluate the regression fit. I
>> would like the covariates to be centred and scaled:
>>
>> p <- prcomp(x.train, center=TRUE, scale=TRUE)
>> x.train.pc <- predict(p)
>>
>> Now I want to get the PCs from the test set.
> 
> The way to do that is to call prcomp() on the test set.
> 
> If you want to project new data onto the PCs of the training set (as a 
> set of axes in the data space), you just use predict(p, newdata=).
> 
>> Should I use the same center and scale vectors from the training set:
>>
>> x.test.pc <- predict(p, newdata=x.test, center=p$center, scale=p$center)
>>
>> or use the training set's own centers and scales:
>>
>> x.test.pc <- predict(p, newdata=x.test, center=TRUE, scale=TRUE)
> 
> I see no evidence that those additional arguments are used.
> 
> predict.prcomp uses the origin of the training set's PCs, since it is 
> that coordinate system which you are projecting onto.
> 

I should've have looked more carefully, now I see that in the code for 
predict.prcomp the test data will indeed get centred and scaled 
according to the training data's vectors:

getAnywhere(predict.prcomp)
...
scale(newdata, object$center, object$scale) %*% object$rotation

Thanks,
Gad

-- 
Gad Abraham
Dept. CSSE and NICTA
The University of Melbourne
Parkville 3010, Victoria, Australia
email: gabraham at csse.unimelb.edu.au
web: http://www.csse.unimelb.edu.au/~gabraham