[R] functions to calculate t-stats, etc. for lm.fit objects?

Wed Jul 8 16:50:56 CEST 2009

On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote:

> I'm running a huge number of regressions in a loop, so I tried lm.fit
> for a speedup.  However, I would like to be able to calculate the
> t-stats for the coefficients.
>
> Does anyone have some functions for calculating the regression summary
> stats of an lm.fit object?
>
> Thanks,
> Whit

Whit, depending upon just how much time savings you are realizing by  
using lm.fit() and not lm(), the approach to your question may vary.

Do you need all of the models, or only a subset?

If the latter, then I would narrow down your model set and re-run them  
with lm() so that you can use summary.lm() directly. That would entail  
less custom coding, which may otherwise offset any time savings from  
using lm.fit()

If the former, then there are two choices as I see them.

The first would be to restructure the object resulting from lm.fit()  
by adding the elements required to run summary.lm(). However, I would  
think that this overhead would bring you back to a point where just  
using lm() would be a better approach from a time standpoint.

The second would be to cook up a function that only provides the  
subset of results that you need from summary.lm() and then use that on  
the results of lm.fit(). Here again, there remains the question of  
just how much time are you saving using lm.fit() versus the additional  
overhead of calculating even a subset of the output.

Here is a very simple approach to a function that would get you a  
subset of the output that you would get using, for example,  
coef(summary(lm.object)). This is using a selective approach of  
copying and slightly editing code from summary.lm(). Note that there  
is other code in summary.lm() to handle weights and such, if your  
models are more complex. You would need to add that in if that is the  
case.

If you need much more summary output than this on each model, then I  
think you would be better off just using lm() and summary.lm().

# Use at your own risk...untested on more complex models  :-)

# 'x' is an lm.fit object

calc.lm.t <- function(x)
{
   Qr <- x$qr
   r <- x$residuals
   p <- x$rank
   p1 <- 1L:p
   rss <- sum(r^2)

   n <- NROW(Qr$qr)
   rdf <- n - p

   resvar <- rss/rdf
   R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
   se <- sqrt(diag(R) * resvar)

   est <- x$coefficients[Qr$pivot[p1]]
   tval <- est/se

   res <- cbind(est = est, se = se, tval = tval)
   res
}

Here is some simple example data:

set.seed(1)
y <- rnorm(100)
x <- rnorm(100)

# Get the default coefficient output using summary.lm()
 > coef(summary(lm(y ~ x)))
                  Estimate Std. Error     t value  Pr(>|t|)
(Intercept)  0.1088521158 0.09034800  1.20480938 0.2311784
x           -0.0009323697 0.09472155 -0.00984327 0.9921663

# Now use calc.lm.t

lmf <- lm.fit(model.matrix(y ~ x), y)

 > calc.lm.t(lmf)
                       est         se        tval
(Intercept)  0.1088521158 0.09034800  1.20480938
x           -0.0009323697 0.09472155 -0.00984327

I'll leave it to you to see whether this approach may or may not be  
helpful from a time perspective.

HTH,

Marc Schwartz