[R] "mvr" function

Bjørn-Helge Mevik bhx2 at mevik.net
Fri Jun 3 09:55:13 CEST 2005


Jim BRINDLE writes:

> volumes <- read.table("THA_vol.txt", header = TRUE)
>
> and then created a data.frame called "vol".  My response variable is
> in the last column of the "vol" data frame and my dependent variables
> are in columns 1 through 11.

[...]

> y <- vol[,12]
> X <- vol[,1:11]
> ans.pcr <- pcr(y ~ X,6,data=vol,validation="CV")

There are two problems here:

1) X is a data frame, not a matrix.  This is what causes the error message.

2) You specify in the call that pcr should look in the data frame
   `vol' for variables called 'y' and 'X'.  (Presumably) they don't
   exist there, but in the global environment (because of the
   assignments `y <- vol[,12]', etc).  (This will not lead to an
   error, because pcr will find the variables anyway, but might lead
   to confusion or errors if you later modify those variables.)

The first problem can be overcome by doing

 X <- as.matrix(vol[,1:11])

and the second one by

 ans.pcr <- pcr(y ~ X, 6, validation = "CV")

However, there are (as always in R :) several ways of accomplishing
the same thing.  One solution is simply

 ans.pcr <- pcr(V12 ~ ., 6, data = vol, validation = "CV")

(where V12 must be substituted with the name of the 12th variable of
vol; see names(vol)).  This formula tells pcr to use V12 as the
response, and the remaining variable (in vol) as predictors.

A more general solution is to say

 vol2 <- data.frame(y = vol[,12], X = I(as.matrix(vol[,1:11])))
 ans.pcr <- pcr(y ~ X, 6, data = vol2, validation = "CV")

The I() makes R store X as a matrix in vol2, instead of as 11 separate
variables.  This is handy for cases where you have several matrices.

The manual page for `lm' and the R manual `An Introduction to R'
(chapter 11) are good references for the formula handling in R.


-- 
HTH,
Bjørn-Helge Mevik




More information about the R-help mailing list