[R] Interpreting lm Residuals...

David Winsemius dwinsemius at comcast.net
Mon Jun 21 18:10:55 CEST 2010


On Jun 21, 2010, at 10:27 AM, David Riebel wrote:

> I am using the lm function in R to fit several linear models to a
> fair-sized dataset (~160 collections of ~1000 data points each).  My
> data have intrinsic, systematic uncertainty much greater than the
> measurement errors on any individual point.  My thought is to use the
> residuals of my linear fits to quantify this intrinsic uncertainty,  
> but
> I am puzzled over the correct interpretation of R's output.
>
> I have attached plots of the fit and the residuals to one of my
> sub-groups, for illustration.  By eye, the overwhelming majority of  
> the
> residuals are within +- 0.4, and I would therefore expect the standard
> error of the residuals to be ~0.2.  However, the output from lm does  
> not
> show this:

Crack open a basic regression text. The standard error (more  
completely, the standard error of the estimate) refers to the  
parameter, not the residuals. It will depend on SS(resid)/(n), but  
there are obviously other components in the calculation. Furthermore,  
you have complicated matters by adding a weights term which will  
affect your estimates in a manner that we cannot predict since you did  
not provide the full data.

>
>> summary(ofit)
>
> Call:
> lm(formula = omag ~ oper, weights = (1/oerr))
>
> Residuals:
>     Min       1Q   Median       3Q      Max
> -3.32185 -0.41181  0.03983  0.40041  2.52971
>
> Coefficients:
>            Estimate Std. Error t value Pr(>|t|)
> (Intercept) 19.52847    0.03979   490.8   <2e-16 ***
> oper        -4.25297    0.02101  -202.4   <2e-16 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> Residual standard error: 0.6705 on 2287 degrees of freedom
> Multiple R-squared: 0.9471, Adjusted R-squared: 0.9471
> F-statistic: 4.097e+04 on 1 and 2287 DF,  p-value: < 2.2e-16
>
> The plot thickens when I examine the residuals themselves:
>> summary(resid(ofit))
>     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
> -0.611800 -0.095720  0.010200  0.005954  0.101100  0.680700
>> sd(resid(ofit))
> [1] 0.1533568
>
> These numbers are much more what I see by eye.  There really aren't  
> any
> residuals outside ~0.6, certainly nothing as large as 3.3!  The help
> feature for lm tells me that the residuals are "the residuals, that is
> response minus fitted values."  Exactly what I would expect.  As an
> Astronomer, my knowledge of statistics is rather "workman-like" if you
> will, but to me, "Residual standard error" means "the standard  
> deviation
> of the residuals," but the lm output doesn't seem to agree with this.

Probably because you added the weights argument.

>
> I'd appreciate it if someone could clarify what's being output by the
> summary function acting on an lm object.
>
> Replies by e-mail preferred.
>
> Thanks,
>
>
> David Riebel
> Graduate Research Assistant
> Johns Hopkins University
> Department of Physics and Astronomy

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list