[R] R's basic lm() and summary.lm functions

ivo welch ivo.welch at anderson.ucla.edu
Sat May 11 00:46:08 CEST 2013

dear R experts:

I am contemplating the logic in R's lm() and summary(lm()) statements.
 the reason is that I want to extend the functionality of lm to give
me both standardized coefficients and newey-west standard errors and
Ts.  I have the code and can stick it at the end of the lm() function
(and for others, I have included my amateurish coding below)---it
works.  the big advantage of having the code directly in lm() is that
it frees me from having to reconstruct the X matrix, which is not
trivial given aliases, cross-terms, etc.  but I now have some more
philosophical questions.  what is the "R way" of extending linear
models?  and why is not everything from the summary.lm object in the
lm object to begin with?

I first thought that lm() is trying to store only the basic
information, summary.lm() adds to it, and print.summary.lm() formats
it for the screen.  but this is not really true.

> x=rnorm(1000); y=rnorm(1000); z=rnorm(1000)
> object.size( lm( y ~ x*z ))
271552 bytes
> object.size( summary(lm( y ~ x*z )))
73912 bytes

the summary.lm object carries less information than the lm object.
looking at the code in summary.lm(), there does not seem to be
anything that would not easily fit into lm() itself.  moreover,
summary.lm must work without x=TRUE in the lm() invocation. so, doing
(my) calculation inside summary.lm() seems harder than doing it inside

in this case, why not have the contents of summary.lm() inside lm() to
begin with?  it would eliminate one layer of complexity.  if the user
does not want standard errors because calculation could take too much
time (why? where?), then the lm() function could have a
standard.errors=TRUE argument.

so, primarily I don't understand the logic why R has both an lm and a
summary.lm object.  secondarily, I don't understand why my own
additions should not go into lm(), given that I think the the other
coefficients (standard errors, etc.) should, too.  third, should I
worry that R's built-in lm() function could change so I should not
replace it?  (without a hook in the end, I don't think lm() is easy to
wrap.  I need the X matrix.  I could write a wrapper function to
invoke lm() with x=TRUE, then do what I need to do, and then remove
'$x' from the lm object.  I would just carry more bytes along during
my calculations.)

I hope I am not imposing too much here...



## standardized coefficients, newey-west standard errors, and a hook
for further enhancements

    if (stdcoefs) z$stdcoefs <- z$coefficients*apply(x,2,sd)/sd(mf$y)

    if (newey.west>=0) {

      x.na.omitted <- x
      r.na.omitted <- residuals(z)
      diagband.matrix <- function(m, ar.terms) {
        nomit <- m - ar.terms - 1
        mm <- matrix(TRUE, nrow = m, ncol = m)
        mm[1:nomit, (ncol(mm) - nomit + 1):ncol(mm)] <-
(lower.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
        mm[(ncol(mm) - nomit + 1):ncol(mm), 1:nomit] <-
(upper.tri(matrix(TRUE, nrow = nomit, ncol = nomit)))
      invx <- chol2inv(chol(crossprod(x.na.omitted)))
      invx.x <- invx %*% t(x.na.omitted)
      if (newey.west==0)
        resid.matrix <- diag(r.na.omitted^2)
      else {
        full <- r.na.omitted %*% t(r.na.omitted)
        maskmatrix <- diagband.matrix(length(r.na.omitted), newey.west)
        resid.matrix <- full * maskmatrix
      vmat <- invx.x %*% resid.matrix %*% t(invx.x)

      z$nw <- newey.west  ## the number of AR terms
      z$nw.se <- sqrt(diag(vmat))  ## the standard errors

    if (!is.null(lm.hook)) lm.hook()  ## has access to everything that
lm() has already computed

Ivo Welch (ivo.welch at gmail.com)

More information about the R-help mailing list