Mon Jul 1 16:31:21 CEST 2002
On Mon, 1 Jul 2002, Prof Brian D Ripley wrote:
> I should point that there is (as I thought) nothing wrong with predict.lm
> on a rank-degenerate problem, e.g.
>
> x1 <- rnorm(100)
> x3 <- rnorm(100)
> y <- rnorm(100)
> train <- data.frame(y=y, x1=x1, x2=x1, x3=x3)
> fit <- lm(y ~ ., train)
> stopifnot(all.equal(predict(fit), predict(fit, train)))
>
> although as Thomas points out a warning would be useful.
>
> The problem here is that model.matrix is (for me) adding 13 duplicate
> columns in lm and not in predict.lm. That's a bug unrelated to predict().
Follow up: the data file posted contains illegal variable names. These
are remapped by make.names into valid names, thereby creating duplicated
names. terms.formula creates a formula with these duplicated names in,
and with a column in model.matrix for each of the duplicates. However, as
the formula is invalid, it gets corrected in predict.lm by
delete.response().
So the error is to attempt to use a data frame with invalid names, and the
bug is that R did not detect the duplicates.
read.table should call make.names(col.name, unique=TRUE) to avoid this,
and terms.formula needs to check for duplicates too.
