[R] functions to calculate t-stats, etc. for lm.fit objects?

Wed Jul 8 17:07:03 CEST 2009

Marc,

Thanks very much for your detailed reply.  I'll give your code a try
and post back the time difference.

Cheers,
Whit

On Wed, Jul 8, 2009 at 10:50 AM, Marc Schwartz<marc_schwartz at me.com> wrote:
> On Jul 8, 2009, at 8:51 AM, Whit Armstrong wrote:
>
>> I'm running a huge number of regressions in a loop, so I tried lm.fit
>> for a speedup.  However, I would like to be able to calculate the
>> t-stats for the coefficients.
>>
>> Does anyone have some functions for calculating the regression summary
>> stats of an lm.fit object?
>>
>> Thanks,
>> Whit
>
>
>
> Whit, depending upon just how much time savings you are realizing by using
> lm.fit() and not lm(), the approach to your question may vary.
>
> Do you need all of the models, or only a subset?
>
> If the latter, then I would narrow down your model set and re-run them with
> lm() so that you can use summary.lm() directly. That would entail less
> custom coding, which may otherwise offset any time savings from using
> lm.fit()
>
> If the former, then there are two choices as I see them.
>
> The first would be to restructure the object resulting from lm.fit() by
> adding the elements required to run summary.lm(). However, I would think
> that this overhead would bring you back to a point where just using lm()
> would be a better approach from a time standpoint.
>
> The second would be to cook up a function that only provides the subset of
> results that you need from summary.lm() and then use that on the results of
> lm.fit(). Here again, there remains the question of just how much time are
> you saving using lm.fit() versus the additional overhead of calculating even
> a subset of the output.
>
> Here is a very simple approach to a function that would get you a subset of
> the output that you would get using, for example, coef(summary(lm.object)).
> This is using a selective approach of copying and slightly editing code from
> summary.lm(). Note that there is other code in summary.lm() to handle
> weights and such, if your models are more complex. You would need to add
> that in if that is the case.
>
> If you need much more summary output than this on each model, then I think
> you would be better off just using lm() and summary.lm().
>
>
> # Use at your own risk...untested on more complex models  :-)
>
> # 'x' is an lm.fit object
>
> calc.lm.t <- function(x)
> {
>  Qr <- x$qr
>  r <- x$residuals
>  p <- x$rank
>  p1 <- 1L:p
>  rss <- sum(r^2)
>
>  n <- NROW(Qr$qr)
>  rdf <- n - p
>
>  resvar <- rss/rdf
>  R <- chol2inv(Qr$qr[p1, p1, drop = FALSE])
>  se <- sqrt(diag(R) * resvar)
>
>  est <- x$coefficients[Qr$pivot[p1]]
>  tval <- est/se
>
>  res <- cbind(est = est, se = se, tval = tval)
>  res
> }
>
>
>
> Here is some simple example data:
>
> set.seed(1)
> y <- rnorm(100)
> x <- rnorm(100)
>
>
> # Get the default coefficient output using summary.lm()
>> coef(summary(lm(y ~ x)))
>                 Estimate Std. Error     t value  Pr(>|t|)
> (Intercept)  0.1088521158 0.09034800  1.20480938 0.2311784
> x           -0.0009323697 0.09472155 -0.00984327 0.9921663
>
>
>
> # Now use calc.lm.t
>
> lmf <- lm.fit(model.matrix(y ~ x), y)
>
>> calc.lm.t(lmf)
>                      est         se        tval
> (Intercept)  0.1088521158 0.09034800  1.20480938
> x           -0.0009323697 0.09472155 -0.00984327
>
>
>
> I'll leave it to you to see whether this approach may or may not be helpful
> from a time perspective.
>
> HTH,
>
> Marc Schwartz
>
>