[R] problem with predict()

ripley@stats.ox.ac.uk ripley at stats.ox.ac.uk
Mon Jul 1 16:31:21 CEST 2002


On Mon, 1 Jul 2002, Prof Brian D Ripley wrote:

> I should point that there is (as I thought) nothing wrong with predict.lm
> on a rank-degenerate problem, e.g.
>
> x1 <- rnorm(100)
> x3 <- rnorm(100)
> y <- rnorm(100)
> train <- data.frame(y=y, x1=x1, x2=x1, x3=x3)
> fit <- lm(y ~ ., train)
> stopifnot(all.equal(predict(fit), predict(fit, train)))
>
> although as Thomas points out a warning would be useful.
>
> The problem here is that model.matrix is (for me) adding 13 duplicate
> columns in lm and not in predict.lm.  That's a bug unrelated to predict().

Follow up: the data file posted contains illegal variable names. These
are remapped by make.names into valid names, thereby creating duplicated
names.  terms.formula creates a formula with these duplicated names in,
and with a column in model.matrix for each of the duplicates.  However, as
the formula is invalid, it gets corrected in predict.lm by
delete.response().

So the error is to attempt to use a data frame with invalid names, and the
bug is that R did not detect the duplicates.

read.table should call make.names(col.name, unique=TRUE) to avoid this,
and terms.formula needs to check for duplicates too.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list