[Rd] using predict() with poly(x, raw=TRUE)

John Fox jfox at mcmaster.ca
Wed Mar 14 23:45:06 CET 2012


Dear r-devel list members,

I've recently encountered the following problem using predict() with a model
that has raw-polynomial terms. (Actually, I encountered the problem using
model.frame(), but the source of the error is the same.) The problem is
technical and concerns the design of poly(), which is why I'm sending this
message to r-devel rather than r-help.

To illustrate:

------------ snip -------------

> mod.pres <- lm(prestige ~ log(income, 10) + poly(education, 3) +
poly(women, 2), 
+                data=Prestige)  # Prestige data from car package

> predict(mod.pres, newdata=data.frame(education=10, income=6000, women=30))
# works
       1 
39.66414 

> model.frame(delete.response(terms(mod.pres)), data.frame(education=10,
income=6000, women=30))
  log(income, 10) poly(education, 3).1 poly(education, 3).2 poly(education,
3).3 poly(women, 2).1 poly(women, 2).2
1        3.778151          -0.02691558          -0.08720086
0.07199804      0.003202256     -0.138777837

> mod.pres.raw <- lm(prestige ~ log(income, 10) + poly(education, 3,
raw=TRUE) + poly(women, 2, raw=TRUE), 
+                    data=Prestige)

> predict(mod.pres.raw, newdata=data.frame(education=10, income=6000,
women=30)) # doesn't work
Error in poly(education, 3, raw = TRUE) : 
  'degree' must be less than number of unique points

> model.frame(delete.response(terms(mod.pres.raw)), data.frame(education=10,
income=6000, women=30))
Error in poly(education, 3, raw = TRUE) : 
  'degree' must be less than number of unique points

------------ snip -------------

I understand the source of the error, but what's unclear to me is why it's
necessary for poly() to check the degree of the polynomial against the
number of unique supplied points when raw=TRUE. For example, if I simply
remove this check from poly(), then

------------ snip -------------

> predict(mod.pres.raw, newdata=data.frame(education=10, income=6000,
women=30))
       1 
39.66414 

> model.frame(delete.response(terms(mod.pres.raw)), data.frame(education=10,
income=6000, women=30))
  log(income, 10) poly(education, 3, raw = TRUE).1 poly(education, 3, raw =
TRUE).2 poly(education, 3, raw = TRUE).3 poly(women, 2, raw = TRUE).1
poly(women, 2, raw = TRUE).2
1        3.778151                               10
100                             1000                           30
900

------------ snip -------------

Of course, if one then used the modified poly() in a regression with fewer
unique Xs than the degree of the polynomial, the model matrix would be
singular; but why not just let the error appear at that stage? 

I could solve my problem by maintaining a local version of poly(), but why
should it not be possible to get predictions under these circumstances?

Best,
 John

--------------------------------
John Fox
Senator William McMaster
  Professor of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox



More information about the R-devel mailing list