[R] problem with predict()

Czerminski, Ryszard ryszard at arqule.com
Fri Jun 21 20:58:15 CEST 2002


--- first problem 

If I store 'simulated' data in data frames:
# train.data <- data.frame(matrix(rnorm(164*119), nrow = 164))
# test.data <- data.frame(matrix(rnorm(35*119), nrow = 35))
it still works the same way i.e. the code below works fine
for simulated data and fails for 'real' data the only
difference being in actual numeric values stored in data
structures of the same shape and type.

Any suggestions why this happens ?

--- second problem

> As Andy Liaw pointed out, xr is a matrix.  Take a look at the names of
> train.  Hint: they do not contain `x'.

Following your hint I am guessing that the fact that names do not contain
'x'
explains why lm(y~., train) form works and lm(y~x, train) fails
and "lm(y~., train)" means roughly: correlate column "y" to all other colums

Where I can find more detail specification of this syntax ?
In help(lm) I find this paragraph:

     Models for `lm' are specified symbolically.  A typical model has
     the form `response ~ terms' where `response' is the (numeric)...

which does not quite cover this case.

Ryszard Czerminski   phone: (781)994-0479
ArQule, Inc.         email:ryszard at arqule.com
19 Presidential Way  http://www.arqule.com
Woburn, MA 01801     fax: (781)994-0679


-----Original Message-----
From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk]
Sent: Friday, June 21, 2002 2:31 PM
To: Czerminski, Ryszard
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] problem with predict()


As Andy Liaw pointed out, xr is a matrix.  Take a look at the names of
train.  Hint: they do not contain `x'.

Similarly, in your `simulated' example you have matrices and not data
frames.

*Store your data in data frames*  and you may be less confused.

On Fri, 21 Jun 2002, Czerminski, Ryszard wrote:

> Thank you for great support so far !
> I think I am getting closer, but I still not quite get it...
>
> Two questions:
>
> (1) what is the difference between lm(y~., and lm(y~x, ???
>     with second form failing ?
>
> > train <- data.frame(y = yr, x = xr)
> > test <- data.frame(y = ys, x = xs)
> > model <- lm(y~., train)
> > model <- lm(y~x, train)
> Error in eval(expr, envir, enclos) : Object "x" not found
>
> (2) and the other problems seems to be data related...
>
> Consider following code:
>
> :::
> rm(list=ls())
>
> train.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
> comment.char="")
> test.data <- read.csv("test.csv", header = TRUE, row.names = "mol",
> comment.char="")
>
> #train.data <- matrix(rnorm(164*119), nrow = 164)
> #test.data <- matrix(rnorm(35*119), nrow = 35)
>
> yr <- train.data[,1]; xr <- train.data[,-1]
> xr <- scale(xr)     # matrix <- scale(data.frame)
> x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr, "scaled:scale")
> mask <- apply(xr, 2, function(x) any(is.na(x)))
> xr <- xr[,!mask] # rm NA's
> ys <- test.data[,1]; xs <- test.data[,-1]
> xs <- scale(xs, center = x.center, scale = x.scale)
> xs <- xs[,!mask]
> train <- data.frame(y = yr, x = xr)
> test <- data.frame(y = ys, x = xs)
> model <- lm(y~., train)
> length(predict(model, test))
> ::::
>
> and execute it twice with: (S) simulated data and (R) "real" data I get:
>
> ::: for simulated data :::
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> [1] 35
>
> ::: for real data :::
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
>         subscript out of bounds
>
> The shape of data seems to be the same in both cases and
> the only difference (as far as I can tell) is in actual values
>
> R
>
> Ryszard Czerminski   phone: (781)994-0479
> ArQule, Inc.         email:ryszard at arqule.com
> 19 Presidential Way  http://www.arqule.com
> Woburn, MA 01801     fax: (781)994-0679
>
>
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com]
> Sent: Friday, June 21, 2002 1:06 PM
> To: 'Peter Dalgaard BSA'
> Cc: 'Czerminski, Ryszard'; r-help at stat.math.ethz.ch
> Subject: RE: [R] problem with predict()
>
>
> The problem is that xr and xs are both matrices in his example, not
vectors.
>
> Andy
>
> > -----Original Message-----
> > From: Peter Dalgaard BSA [mailto:p.dalgaard at biostat.ku.dk]
> > Sent: Friday, June 21, 2002 1:03 PM
> > To: Liaw, Andy
> > Cc: 'Czerminski, Ryszard'; r-help at stat.math.ethz.ch
> > Subject: Re: [R] problem with predict()
> >
> >
> > "Liaw, Andy" <andy_liaw at merck.com> writes:
> >
> > > You still don't get the point.  Please read Peter
> > Dalgaard's reply and the
> > > help page for predict.lm carefully, and try to understand
> > the `Detail'
> > > section.  See the example below:
> > [snip]
> >
> > > > This looks promissing; however I get an error:
> > > >
> > > > > train <- data.frame(y=yr, x=xr)
> > > > > test <- data.frame(y=ys, x=xs)
> > > > > myfit <- lm(y ~ x, train)
> > > > Error in eval(expr, envir, enclos) : Object "x" not found
> >
> > But there's nothing wrong with that code as far as I can see?? I don't
> > get an error from it:
> >
> > > xr <- rnorm(10)
> > > yr <- rnorm(10)
> > > ys <- rnorm(5)
> > > xs <- rnorm(5)
> > > train <- data.frame(y=yr, x=xr)
> > >  test <- data.frame(y=ys, x=xs)
> > > myfit <- lm(y ~ x, train)
> > > predict(myfit,test)
> >           1           2           3           4           5
> > -0.03809295  0.11422384  0.35570765  0.55436954  0.22979523
> >
> >
> > Something must have gone wrong with the creation of "train".
> >
> > --
> >    O__  ---- Peter Dalgaard             Blegdamsvej 3
> >   c/ /'_ --- Dept. of Biostatistics     2200 Cph. N
> >  (*) \(*) -- University of Copenhagen   Denmark      Ph:
> > (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX:
> > (+45) 35327907
> >
>
>
----------------------------------------------------------------------------
> --
> Notice: This e-mail message, together with any attachments, contains
> information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA)
that
> may be confidential, proprietary copyrighted and/or legally privileged,
and
> is intended solely for the use of the individual or entity named on this
> message. If you are not the intended recipient, and have received this
> message in error, please immediately return this by e-mail and then delete
> it.
>
>
============================================================================
> ==
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list