[R] extend summary.lm for hccm?

John Fox jfox at mcmaster.ca
Sat Jan 3 20:11:02 CET 2009


Dear Achim,

I suspect that the problem, involving a fifth-degree raw polynomial, is
very ill-conditioned, and that the computation in linear.hypothesis()
fails because it is not as stable as lm() and summary.lm(). (BTW, one
would not normally call summary.lm() directly, but rather use the
generic summary() function instead.) A possible solution would be to
use a fifth-degree orthogonal polynomial, with the formula energyshare
~ poly(x.log, 5). (You don't need the 1 for the constant, since it's
implied.)

That said, it's hard for me to understand why it's interesting to have
standard errors for the individual coefficients of a high-degree
polynomial, and I'd also be concerned about the sensibleness of fitting
a fifth-degree polynomial in the first place.

I hope this helps,
 John

--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/


------------ original message ----------
Achim Voß achim.voss at uni-muenster.de
Tue Dec 30 14:15:37 CET 2008

Hi!


I am trying to estimate Engel curves using a big sample (>42,000) using
lm and
taking heteroskedasticity into account by using the summaryHCCM posted
here by
John Fox (Mon Dec 25 16:01:59 CET 2006).

Having used the SIC (with MASS stepAIC) to determine how many powers to
use I
estimate the model:

> # =========================================
> summary.lm(fit.lm.5)

Call:
lm(formula = energyshare ~ 1 + I(x.log) + I(x.log^2) + I(x.log^3) +
    I(x.log^4) + I(x.log^5), data = ev)

Residuals:
      Min        1Q    Median        3Q       Max
-0.098819 -0.023682 -0.007043  0.013924  0.486615

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.452e+01  1.234e+01  -4.418 9.97e-06 ***
I(x.log)     3.177e+01  6.966e+00   4.560 5.13e-06 ***
I(x.log^2)  -7.330e+00  1.567e+00  -4.677 2.93e-06 ***
I(x.log^3)   8.395e-01  1.757e-01   4.778 1.78e-06 ***
I(x.log^4)  -4.775e-02  9.814e-03  -4.865 1.15e-06 ***
I(x.log^5)   1.079e-03  2.185e-04   4.939 7.90e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.03748 on 42738 degrees of freedom
Multiple R-squared: 0.09236,    Adjusted R-squared: 0.09226
F-statistic: 869.8 on 5 and 42738 DF,  p-value: < 2.2e-16
> # =========================================




Now I use summaryHCCM(fit.lm.5):

> # =========================================
> summaryHCCM(fit.lm.5)
Fehler in solve.default(L %*% V %*% t(L)) :
  System ist für den Rechner singulär: reziproke Konditionszahl =
6.98689e-19
> # =========================================

("Error in solve.default(L %*% V %*% t(L)) :
  System is singulary for the computer: reciprocal number of conditions
=
  6.98689e-19")



This does not happen if I omit I(x.log^5). I do not know what it means
and I'd
be grateful if anyone could help. And I'd like to add a (more or less)
related
question: Can I still use AIC, SIC etc. if I know there's a
heteroskedasticity
problem?


Thanks in advance,
Achim




More information about the R-help mailing list