[R] Intepreting lm() results with factor

peter dalgaard pdalgd at gmail.com
Tue Dec 3 14:46:59 CET 2013


On 03 Dec 2013, at 01:08 , David Gwenzi <dgwenzi at gmail.com> wrote:

> Dear all
> 
> I have observations done in 4 different classes and the between classes
> *variance* is too high that I decided to run a model without pooling the
> *variance*. I used the following code first :
>                   model<-lm(y~x+factor(class))
> and got the following output:
> Coefficients:
>                Estimate Std. Error t value Pr(>|t|)
> (Intercept)     52.41405   17.38161   3.015  0.00658 **
> x                0.27679    0.07387   3.747  0.00119 **
> factor(class)2  92.68083   32.26645   2.872  0.00912 **
> factor(class)3 197.82029   33.24916   5.950 6.63e-06 ***
> factor(class)4 105.61266   55.18373   1.914  0.06937 .
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 43.07 on 21 degrees of freedom
> Multiple R-squared:  0.9206,    Adjusted R-squared:  0.9055
> F-statistic: 60.91 on 4 and 21 DF,  p-value: 2.976e-11
> 
> My understanding of this output is that class 1 is used as a baseline
> (constant) and each other class's p values means for example the dependent
> value in class 2 is significantly different from that of class 1.
> Now I ran the model again, but without using a constant i.e
>                    model<-lm(y~x+factor(class)-1)
> and got the following output:
> Coefficients:
>                Estimate Std. Error t value Pr(>|t|)
> x                0.27679    0.07387   3.747  0.00119 **
> factor(class)1  52.41405   17.38161   3.015  0.00658 **
> factor(class)2 145.09488   39.42651   3.680  0.00139 **
> factor(class)3 250.23434   40.61189   6.162 4.11e-06 ***
> factor(class)4 158.02672   64.09549   2.465  0.02238 *
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 43.07 on 21 degrees of freedom
> Multiple R-squared:  0.9801,    Adjusted R-squared:  0.9754
> F-statistic: 207.1 on 5 and 21 DF,  p-value: < 2.2e-16
> 
> Can somebody please tell me how to interpret this one now? what do the
> classes' P values mean ? Do they merely show if they significantly
> contribute to the model or whether they are significantly different from
> the overall mean or not? Does it mean if one class had a p value > 0.05 it
> would mean the observations from that class are not significantly
> contributing to the model?

The estimates are of the per-class intercept and the P-value corresponds to a test that said intercept is zero (which is very rarely a relevant hypothesis).

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list