[R] problem with predict()

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Fri Jun 21 20:30:56 CEST 2002


As Andy Liaw pointed out, xr is a matrix.  Take a look at the names of
train.  Hint: they do not contain `x'.

Similarly, in your `simulated' example you have matrices and not data
frames.

*Store your data in data frames*  and you may be less confused.

On Fri, 21 Jun 2002, Czerminski, Ryszard wrote:

> Thank you for great support so far !
> I think I am getting closer, but I still not quite get it...
>
> Two questions:
>
> (1) what is the difference between lm(y~., and lm(y~x, ???
>     with second form failing ?
>
> > train <- data.frame(y = yr, x = xr)
> > test <- data.frame(y = ys, x = xs)
> > model <- lm(y~., train)
> > model <- lm(y~x, train)
> Error in eval(expr, envir, enclos) : Object "x" not found
>
> (2) and the other problems seems to be data related...
>
> Consider following code:
>
> :::
> rm(list=ls())
>
> train.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
> comment.char="")
> test.data <- read.csv("test.csv", header = TRUE, row.names = "mol",
> comment.char="")
>
> #train.data <- matrix(rnorm(164*119), nrow = 164)
> #test.data <- matrix(rnorm(35*119), nrow = 35)
>
> yr <- train.data[,1]; xr <- train.data[,-1]
> xr <- scale(xr)     # matrix <- scale(data.frame)
> x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr, "scaled:scale")
> mask <- apply(xr, 2, function(x) any(is.na(x)))
> xr <- xr[,!mask] # rm NA's
> ys <- test.data[,1]; xs <- test.data[,-1]
> xs <- scale(xs, center = x.center, scale = x.scale)
> xs <- xs[,!mask]
> train <- data.frame(y = yr, x = xr)
> test <- data.frame(y = ys, x = xs)
> model <- lm(y~., train)
> length(predict(model, test))
> ::::
>
> and execute it twice with: (S) simulated data and (R) "real" data I get:
>
> ::: for simulated data :::
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> [1] 35
>
> ::: for real data :::
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
>         subscript out of bounds
>
> The shape of data seems to be the same in both cases and
> the only difference (as far as I can tell) is in actual values
>
> R
>
> Ryszard Czerminski   phone: (781)994-0479
> ArQule, Inc.         email:ryszard at arqule.com
> 19 Presidential Way  http://www.arqule.com
> Woburn, MA 01801     fax: (781)994-0679
>
>
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com]
> Sent: Friday, June 21, 2002 1:06 PM
> To: 'Peter Dalgaard BSA'
> Cc: 'Czerminski, Ryszard'; r-help at stat.math.ethz.ch
> Subject: RE: [R] problem with predict()
>
>
> The problem is that xr and xs are both matrices in his example, not vectors.
>
> Andy
>
> > -----Original Message-----
> > From: Peter Dalgaard BSA [mailto:p.dalgaard at biostat.ku.dk]
> > Sent: Friday, June 21, 2002 1:03 PM
> > To: Liaw, Andy
> > Cc: 'Czerminski, Ryszard'; r-help at stat.math.ethz.ch
> > Subject: Re: [R] problem with predict()
> >
> >
> > "Liaw, Andy" <andy_liaw at merck.com> writes:
> >
> > > You still don't get the point.  Please read Peter
> > Dalgaard's reply and the
> > > help page for predict.lm carefully, and try to understand
> > the `Detail'
> > > section.  See the example below:
> > [snip]
> >
> > > > This looks promissing; however I get an error:
> > > >
> > > > > train <- data.frame(y=yr, x=xr)
> > > > > test <- data.frame(y=ys, x=xs)
> > > > > myfit <- lm(y ~ x, train)
> > > > Error in eval(expr, envir, enclos) : Object "x" not found
> >
> > But there's nothing wrong with that code as far as I can see?? I don't
> > get an error from it:
> >
> > > xr <- rnorm(10)
> > > yr <- rnorm(10)
> > > ys <- rnorm(5)
> > > xs <- rnorm(5)
> > > train <- data.frame(y=yr, x=xr)
> > >  test <- data.frame(y=ys, x=xs)
> > > myfit <- lm(y ~ x, train)
> > > predict(myfit,test)
> >           1           2           3           4           5
> > -0.03809295  0.11422384  0.35570765  0.55436954  0.22979523
> >
> >
> > Something must have gone wrong with the creation of "train".
> >
> > --
> >    O__  ---- Peter Dalgaard             Blegdamsvej 3
> >   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
> >  (*) \(*) -- University of Copenhagen   Denmark      Ph:
> > (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX:
> > (+45) 35327907
> >
>
> ----------------------------------------------------------------------------
> --
> Notice: This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA) that
> may be confidential, proprietary copyrighted and/or legally privileged, and
> is intended solely for the use of the individual or entity named on this
> message. If you are not the intended recipient, and have received this
> message in error, please immediately return this by e-mail and then delete
> it.
>
> ============================================================================
> ==
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list