[R] problem with predict()

Czerminski, Ryszard ryszard at arqule.com
Thu Jun 27 21:29:23 CEST 2002


# Yes. You are *still* using a matrix in a data frame.  Please do read more
# carefully.

I have read some more R documentation trying to understand difference
between
matrices and data frames etc... and I repeat my example this time
executing EXACTLY the same code with only difference being that in one case
I use smaller data sets ({train,test}-small.csv) and in the second case I
use larger
data sets ({train,test}.csv) - and I got different behaviour.

Small case (10*4) runs fine, larger case (164*119) fails.

Any ideas why this happens ? 

R

> rm(list=ls())
> train.data <- read.csv("train-small.csv", header = TRUE, row.names =
"mol", comment.char="")
> test.data <- read.csv("test-small.csv", header = TRUE, row.names = "mol",
comment.char="")
> yr <- train.data[,1]; xr <- train.data[,-1]
> xr <- scale(xr)
> x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr, "scaled:scale")
> mask <- apply(xr, 2, function(x) any(is.na(x)))
> xr <- xr[,!mask] # rm NA's
> ys <- test.data[,1]; xs <- test.data[,-1]
> xs <- scale(xs, center = x.center, scale = x.scale)
> xs <- xs[,!mask]
> train <- data.frame(y = yr, x = xr)
> test <- data.frame(y = ys, x = xs)
> model <- lm(y~., train)
> cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
dim(train) = 10 4 ; dim(test) = 10 4 
> length(predict(model, test))
[1] 10
> train.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
comment.char="")
> test.data <- read.csv("test.csv", header = TRUE, row.names = "mol",
comment.char="")
[snip...]
> cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
dim(train) = 164 119 ; dim(test) = 35 119 
> length(predict(model, test))
Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) : 
        subscript out of bounds
>

Ryszard Czerminski   phone: (781)994-0479
ArQule, Inc.         email:ryszard at arqule.com
19 Presidential Way  http://www.arqule.com
Woburn, MA 01801     fax: (781)994-0679


-----Original Message-----
From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk]
Sent: Friday, June 21, 2002 3:41 PM
To: Czerminski, Ryszard
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] problem with predict()


On Fri, 21 Jun 2002, Czerminski, Ryszard wrote:

> --- first problem
>
> If I store 'simulated' data in data frames:
> # train.data <- data.frame(matrix(rnorm(164*119), nrow = 164))
> # test.data <- data.frame(matrix(rnorm(35*119), nrow = 35))
> it still works the same way i.e. the code below works fine
> for simulated data and fails for 'real' data the only
> difference being in actual numeric values stored in data
> structures of the same shape and type.
>
> Any suggestions why this happens ?

Yes. You are *still* using a matrix in a data frame.  Please do read more
carefully.

> --- second problem
>
> > As Andy Liaw pointed out, xr is a matrix.  Take a look at the names of
> > train.  Hint: they do not contain `x'.
>
> Following your hint I am guessing that the fact that names do not contain
> 'x'
> explains why lm(y~., train) form works and lm(y~x, train) fails
> and "lm(y~., train)" means roughly: correlate column "y" to all other
colums

No, it means regress y on all the remaining colums in the data argument.

>
> Where I can find more detail specification of this syntax ?
> In help(lm) I find this paragraph:
>
>      Models for `lm' are specified symbolically.  A typical model has
>      the form `response ~ terms' where `response' is the (numeric)...
>
> which does not quite cover this case.

In any good book on the subject.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list