[R] refitting lm() with same x, different y

Douglas Bates bates at stat.wisc.edu
Tue Apr 19 15:51:32 CEST 2005


William Valdar wrote:
> 
>> From: Brian Ripley
>>
>> As a first shot, use lm with a matrix response.  That fits them all at 
>> once with one QR-decomposition.  No analogue for glm or lmer, though, 
>> since for those the iterative fits run do depend on the response.
> 
> 
> Thanks Brian, that's very helpful. Also thanks to Kevin Wright who 
> suggested using lsfit(x,Y) as being faster than lm for a Y matrix.
> 
> I've since worked out that I can bypass even more lm machinery by basing 
> my permutation test significance thresholds on the RSS from qr.resid().
> Since,
> 
>     y = QRb + e
>   Q'y = Rb + Q'e
>   RSS = || Q'y - Rb ||
> 
> then I can do
> 
>   X.qr <- qr(X)
> 
> once, and for every instance of y calculate
> 
>   e   <- qr.resid(X.qr, y)
>   rss <- e %*% e
> 
> recording them in
> 
>   rss.for.all.fits[i] <- rss
> 
> which gives me an empirical distribution of RSS scores. The degrees of 
> freedom in my X matrix are constant throughout (I should have said that 
> before), so all RSS's are on a level footing and map trivially to the 
> p-value. I can therefore take the RSS at, say, the 5th percentile, turn 
> it into a p-value and report that as my 5% threshold.

Actually you do not need to calculate the residuals to be able to 
calculate RSS.  If you write Q = [Q1 Q2] where Q1 is the first p columns 
and Q2 is the remaining n - p columns and R1 for the first p rows of R 
then your expression for RSS can be extended as

   RSS = || Q'y - Rb || = || Q1'y - R1 b || + || Q2'y ||

because the last n - p rows of R are zero.  At the least squares value 
of b the first term is zero when X has full column rank.  Thus

   rss <- sum(qr.qty(X.qr, y)[-(1:p)]^2)

qr.qty should be slightly faster than qr.resid because it performs only 
one (virtual) multiplication by Q or Q'.  I doubt that the difference 
would be noticeable in practice.




More information about the R-help mailing list