Czerminski, Ryszard
ryszard at arqule.com
Fri Jun 21 20:58:15 CEST 2002
--- first problem
If I store 'simulated' data in data frames:
# train.data <- data.frame(matrix(rnorm(164*119), nrow = 164))
# test.data <- data.frame(matrix(rnorm(35*119), nrow = 35))
it still works the same way i.e. the code below works fine
for simulated data and fails for 'real' data the only
difference being in actual numeric values stored in data
structures of the same shape and type.
Any suggestions why this happens ?
--- second problem
> As Andy Liaw pointed out, xr is a matrix. Take a look at the names of
> train. Hint: they do not contain `x'.
Following your hint I am guessing that the fact that names do not contain
'x'
explains why lm(y~., train) form works and lm(y~x, train) fails
and "lm(y~., train)" means roughly: correlate column "y" to all other colums
Where I can find more detail specification of this syntax ?
In help(lm) I find this paragraph:
Models for `lm' are specified symbolically. A typical model has
the form `response ~ terms' where `response' is the (numeric)...
which does not quite cover this case.
-----Original Message-----
From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk]
Sent: Friday, June 21, 2002 2:31 PM
To: Czerminski, Ryszard
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] problem with predict()
As Andy Liaw pointed out, xr is a matrix. Take a look at the names of
train. Hint: they do not contain `x'.
Similarly, in your `simulated' example you have matrices and not data
frames.
*Store your data in data frames* and you may be less confused.
On Fri, 21 Jun 2002, Czerminski, Ryszard wrote:
> Thank you for great support so far !
> I think I am getting closer, but I still not quite get it...
>
> Two questions:
>
> (1) what is the difference between lm(y~., and lm(y~x, ???
> with second form failing ?
>
> > train <- data.frame(y = yr, x = xr)
> > test <- data.frame(y = ys, x = xs)
> > model <- lm(y~., train)
> > model <- lm(y~x, train)
> Error in eval(expr, envir, enclos) : Object "x" not found
>
> (2) and the other problems seems to be data related...
>
> Consider following code:
>
> :::
> rm(list=ls())
>
> train.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
> comment.char="")
> test.data <- read.csv("test.csv", header = TRUE, row.names = "mol",
> comment.char="")
>
> #train.data <- matrix(rnorm(164*119), nrow = 164)
> #test.data <- matrix(rnorm(35*119), nrow = 35)
>
> yr <- train.data[,1]; xr <- train.data[,-1]
> xr <- scale(xr) # matrix <- scale(data.frame)
> x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr, "scaled:scale")
> mask <- apply(xr, 2, function(x) any(is.na(x)))
> xr <- xr[,!mask] # rm NA's
> ys <- test.data[,1]; xs <- test.data[,-1]
> xs <- scale(xs, center = x.center, scale = x.scale)
> xs <- xs[,!mask]
> train <- data.frame(y = yr, x = xr)
> test <- data.frame(y = ys, x = xs)
> model <- lm(y~., train)
> length(predict(model, test))
> ::::
>
> and execute it twice with: (S) simulated data and (R) "real" data I get:
>
> ::: for simulated data :::
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> [1] 35
>
> ::: for real data :::
> dim(train) = 164 119 ; dim(test) = 35 119
> > length(predict(model, test))
> Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
> subscript out of bounds
>
> The shape of data seems to be the same in both cases and
> the only difference (as far as I can tell) is in actual values
>
> R
>
>
>
> -----Original Message-----
> From: Liaw, Andy [mailto:andy_liaw at merck.com]
> Sent: Friday, June 21, 2002 1:06 PM
> To: 'Peter Dalgaard BSA'
> Cc: 'Czerminski, Ryszard'; r-help at stat.math.ethz.ch
> Subject: RE: [R] problem with predict()
>
>
> The problem is that xr and xs are both matrices in his example, not
vectors.
>
> Andy
>
> > -----Original Message-----
> > From: Peter Dalgaard BSA [mailto:p.dalgaard at biostat.ku.dk]
> > Sent: Friday, June 21, 2002 1:03 PM
> > To: Liaw, Andy
> > Cc: 'Czerminski, Ryszard'; r-help at stat.math.ethz.ch
> > Subject: Re: [R] problem with predict()
> >
> >
> > "Liaw, Andy" <andy_liaw at merck.com> writes:
> >
> > > You still don't get the point. Please read Peter
> > Dalgaard's reply and the
> > > help page for predict.lm carefully, and try to understand
> > the `Detail'
> > > section. See the example below:
> > [snip]
> >
> > > > This looks promissing; however I get an error:
> > > >
> > > > > train <- data.frame(y=yr, x=xr)
> > > > > test <- data.frame(y=ys, x=xs)
> > > > > myfit <- lm(y ~ x, train)
> > > > Error in eval(expr, envir, enclos) : Object "x" not found
> >
> > But there's nothing wrong with that code as far as I can see?? I don't
> > get an error from it:
> >
> > > xr <- rnorm(10)
> > > yr <- rnorm(10)
> > > ys <- rnorm(5)
> > > xs <- rnorm(5)
> > > train <- data.frame(y=yr, x=xr)
> > > test <- data.frame(y=ys, x=xs)
> > > myfit <- lm(y ~ x, train)
> > > predict(myfit,test)
> > 1 2 3 4 5
> > -0.03809295 0.11422384 0.35570765 0.55436954 0.22979523
> >
> >
> > Something must have gone wrong with the creation of "train".
> >
> > --
> > O__ ---- Peter Dalgaard Blegdamsvej 3
> > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N
> > (*) \(*) -- University of Copenhagen Denmark Ph:
> > (+45) 35327918
> > ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX:
> > (+45) 35327907
> >
>
>
