[R] using poly in a linear regression in the presence of NA fails (despite subsetting them out)

Markus Jäntti markus.jantti at iki.fi
Mon Feb 14 13:52:14 CET 2005


I ran into a to me surprising result on running lm with an orthogonal 
polynomial among the predictors.

The lm command resulted in

Error in qr(X) : NA/NaN/Inf in foreign function call (arg 1)
Error during wrapup:

despite my using a "subset" in the call to get rid of NA's.

poly is apparently evaluated before any NA's are subsetted out
of the data.

Example code (attached to this e-mail as as a script):
 > ## generate some data
 > n <- 50
 > x <- runif(n)
 > a0 <- 10
 > a1 <- .5
 > sigma.e <- 1
 > y <- a0 + a1*x + rnorm(n)*sigma.e
 > tmp.d <- data.frame(y, x)
 > rm(list=c("n", "x", "a0", "a1", "sigma.e", "y"))
 >
 > print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d)
+
+ ## now make a few NA's
+
+ tmp.d$x[1:2] <- rep(NA, 2)
Error: syntax error
Error during wrapup:
 >
 > ## this fails, just as it should
 > print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d))

Call:
lm(formula = y ~ poly(x, 2), data = tmp.d)

Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
      10.380       -0.242       -1.441

 >
 > ## these also fail, but should not?
 >
 > print(lm.2 <- lm(y ~ poly(x, 2), data = tmp.d, subset = !is.na(x)))

Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, subset = !is.na(x))

Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
      10.380       -0.242       -1.441

 > print(lm.3 <- lm(y ~ poly(x, 2), data = tmp.d, na.action = na.omit))

Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, na.action = na.omit)

Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
      10.380       -0.242       -1.441

 >
 > ## but this works
 >
 > print(lm.3 <- lm(y ~ poly(x, 2), data = subset(tmp.d, subset = 
!is.na(x))))

Call:
lm(formula = y ~ poly(x, 2), data = subset(tmp.d, subset = !is.na(x)))

Coefficients:
(Intercept)  poly(x, 2)1  poly(x, 2)2
      10.380       -0.242       -1.441

--------------------

The documentation of lm is *not* misleading at this point, saying that

subset 	an optional vector specifying a subset of observations to be 
used in the fitting process.

which implies that data are subsetted once lm.fit is called.
All the same, this behavior is a little unexpected to me.
Is it to be considered a feature, that is, does it produce beneficial 
side effects which explain why it works as it does?

Regards,

Markus

I am running R on a Debian testing system with kernel 2.6.10 and

 > version
          _
platform i386-pc-linux-gnu
arch     i386
os       linux-gnu
system   i386, linux-gnu
status
major    2
minor    0.1
year     2004
month    11
day      15
language R
-- 
Markus Jantti
Abo Akademi University
markus.jantti at iki.fi
http://www.iki.fi/~mjantti
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: poly-example.R
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050214/68cdf3f1/poly-example.pl


More information about the R-help mailing list