[R] Problems with data structure when using plsr() from package pls

Bjørn-Helge Mevik b.h.mevik at usit.uio.no
Mon Jan 18 10:26:10 CET 2016

S Ellison <S.Ellison at lgcgroup.com> writes:

> Reading ?plsr examples and inspecting the data they use, you need to arrange
> frame1 so that it has the data from n96 included as columns with names of the
> from "n96.xxx" whre xxx can be numbers, names etc.

No, you do not. :)  plsr() is happy with a data frame where n96 is a
single variable consisting of a matrix.  And this is the recommended way
for matrices with a lot of coloumns.  Which is what you get with

frame1 <- data.frame(gushVM, n96 = I(n96))

if n96 is a matrix, or

frame1 <- data.frame(gushVM, n96 = I(as.matrix(n96)))

if it is a data.frame.

> If n96 is a data frame, try something like
> names(n96) <- paste("n96", 1:96) 
> frame1 <- cbind(gushVM, n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)

Have you actually tried this?  It doesn't work:  For instance:

> gushVM <- 1:5
> n96 <- data.frame(a=1:5, b=2:6)
> names(n96) <- paste("n96", 1:2)
> n96
  n96 1 n96 2
1     1     2
2     2     3
3     3     4
4     4     5
5     5     6
> frame1 <- cbind(gushVM, n96)
> frame1
  gushVM n96 1 n96 2
1      1     1     2
2      2     2     3
3      3     3     4
4      4     4     5
5      5     5     6
> dim(frame1)
[1] 5 3
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in model.frame.default(formula = gushVM ~ n96, data = frame1) : 
  invalid type (list) for variable 'n96'

The reason is that frame1 does _not_ contain a variable called 'n96', so
plsr() (or actually model.frame.default()) searches in the global work
space, where it finds a _data.frame_ n96.  A data.frame is a list.
Hence the error message.

> If n96 is a matrix, 
> frame1 <- data.frame(gushVM, n96=n96)
> should also give you a data frame with names of the right format.

It does not:

> n96 <- as.matrix(n96)
> frame1 <- data.frame(gushVM, n96=n96)
> frame1
  gushVM n96.n96.1 n96.n96.2
1      1         1         2
2      2         2         3
3      3         3         4
4      4         4         5
5      5         5         6
> dim(frame1)
[1] 5 3
> names(frame1)
[1] "gushVM"    "n96.n96.1" "n96.n96.2"

So the data frame still does not have any variable named 'n96'.  The
only reason

> pls1 <- plsr(gushVM ~ n96, data = frame1)

seems to work, is that the 'n96' variable it now finds in the global
environment, happens to be a matrix

> class(n96)
[1] "matrix"

If that wasn't there, you would get an error:

> rm(n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)
Error in eval(expr, envir, enclos) : object 'n96' not found

> I() wrapped round a matrix or data frame does nothing like what is needed if
> you include it in a data frame construction, so either things have changed
> since the tutorial was written, or the authors were not handling a matrix or
> data frame with I().

Yes it does. :)  Nothing (substantial) has changed, and we did/do handle
matrices with I():

> n96 <- matrix(1:10, ncol=2)
> n96
     [,1] [,2]
[1,]    1    6
[2,]    2    7
[3,]    3    8
[4,]    4    9
[5,]    5   10
> frame1 <- data.frame(gushVM, I(n96))
> frame1
  gushVM n96.1 n96.2
1      1     1     6
2      2     2     7
3      3     3     8
4      4     4     9
5      5     5    10
> dim(frame1)
[1] 5 2
> names(frame1)
[1] "gushVM" "n96"   
> rm(n96)
> pls1 <- plsr(gushVM ~ n96, data = frame1)
> pls1
Partial least squares regression , fitted with the kernel algorithm.
plsr(formula = gushVM ~ n96, data = frame1)

Bjørn-Helge Mevik

More information about the R-help mailing list