[R] A strange problem using pls package

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Mon Jul 27 13:35:12 CEST 2015


"PO SU" <rhelpmaillist at 163.com> writes:

>  suppose data has 20 columns
>   traindata <- data[ 1:10, 1:10]
>  testdata <- data[11:15,1:10]
>   pls.fit <- plsr(y~x, ncomp = 5, data = traindata, method= "simpls", scale = FALSE, model = TRUE, validation = "CV")
> ok, i get some result, the srange thing happens when i redo the plsr, i mean, i use
>
>  traindata <- data[ 1:10, 1:20]
>  testdata <- data[11:15,1:20]
>  pls.fit <- plsr(y~x, ncomp = 5, data = traindata, method= "simpls", scale = FALSE, model = TRUE, validation = "CV")
>
>
> I get the same result as the first one!!!

The reason is probably that you ask plsr() to use the coloumn of
traindata called "x" as the predictor.  Then it will only use that
coloumn, no matter how many coloumns traindata contains.

The usual way of using plsr() is to have a data.frame with a _matrix_ as
the predictor "coloumn", for instance like this:

mydata <- data.frame(y = some_vector, X = I(some_matrix))
mymodel <- plsr(y ~ X, ..., data = mydata)

If you want to have the predictors as separate vectors, you must name
all of them in the formula (y ~ x1 + x2 + x3 + ...), or you can use the
following shortcut to regress y on all the remaining coloumns:
plsr(y ~ ., ..., data = mydata)

-- 
Regards,
Bjørn-Helge Mevik



More information about the R-help mailing list