# [R] Parameter estimates from an ANCOVA

John Fox jfox at mcmaster.ca
Thu Oct 16 01:00:01 CEST 2008

```Dear J.,

Perhaps the following will clarify what you got:

(1) For those unfamiliar with it, contr.Sum() is like contr.sum(), except
no contr.Poly(), but since you didn't use it, there was no error.

(2) Here's what I got with your contrived data (I set random.seed(123) to
make the computation reproducible, but that doesn't matter much given the
R^2 near 1):

> summary(fit)

Call:
lm(formula = y ~ categ * cont)

Residuals:
Min       1Q   Median       3Q      Max
-0.09106 -0.02992 -0.00602  0.02455  0.08593

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)      0.047043   0.051576   0.912    0.374
categ[S.A]      -0.031164   0.051576  -0.604    0.553
cont             2.997098   0.003364 890.816   <2e-16 ***
categ[S.A]:cont  1.002440   0.003364 297.951   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0499 on 18 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1
F-statistic: 9.572e+05 on 3 and 18 DF,  p-value: < 2.2e-16

Given the sum-to-zero contrast and the interaction, the coefficient
categ[S.A] = -0.031164 is half the difference in the fitted value of y
between categ = A and categ = B, when cont = 0.  Since your cont ranges from
10 to 20, this is not a meaningful comparison.

(3) Of course, in the presence of the interaction, the difference between
levels A and B of categ depends upon the value at which cont is set. You
could express cont as deviations from its grand mean, obtaining:

> cont.d <- cont - mean(cont)

> fit2 <- lm(y ~ categ*cont.d)

> summary(fit2)

Call:
lm(formula = y ~ categ * cont.d)

Residuals:
Min       1Q   Median       3Q      Max
-0.09106 -0.02992 -0.00602  0.02455  0.08593

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)       45.003515   0.010639  4229.9   <2e-16 ***
categ[S.A]        15.005441   0.010639  1410.4   <2e-16 ***
cont.d             2.997098   0.003364   890.8   <2e-16 ***
categ[S.A]:cont.d  1.002440   0.003364   298.0   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.0499 on 18 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1
F-statistic: 9.572e+05 on 3 and 18 DF,  p-value: < 2.2e-16

Now categ[S.A] = 15.005441 represents half the difference in the fitted
value of y between categ = A and categ = B, when cont is equal to its mean.
Since the data are balanced, this is also the difference between the mean of
y at level A of categ minus the grand mean of y, which is what I think you
expected to see.

I hope this helps,
John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of jgar
> Sent: October-15-08 4:35 PM
> To: r-help at r-project.org
> Subject: [R] Parameter estimates from an ANCOVA
>
>
> Hi all,
>
> This is probably going to come off as unnecessary (and show my ignorance)
> but I am trying to understand the parameter estimates I am getting from R
> when doing an ANCOVA.  Basically, I am accustomed to the estimate for the
> categorical variable being equivalent to the respective cell means minus
the
> grand mean.  I know is the case in JMP - all other estimates from these
data
> match the output from R, except I don't understand why the parameter
> estimate for the categorical factor differs.
>
> library(car)
> options(contrasts=c("contr.Sum","contr.Poly"))
>
> cont=as.numeric(rep(10:20, 2))  # fake data
> categ=as.factor(c(rep("A", 11), rep("B", 11)))
> y=(c(cont[1:11]*4 , cont[1:11]*2))+rnorm(mean=0, sd=.05, 22)
>
> fit=lm(y~categ*cont)
> scatterplot(y~cont|categ)   # shows interaction
> scatterplot(y~categ|cont)   # shows effect of the categorical variable,
> irrespective of cont
>
> summary(fit); anova(fit)
>
> coeff=coef(summary(fit))[,1]  # column of estimates to "coeff"
>
> grandmean=coeff[1]+ mean(cont)*coeff[3]  #GM = intercept +
> mean(cont)*regression_slope = 45.00677
> slope_A=coeff[3]+coeff[4]  #slope of individual regression for A  = 3.998
> slope_B=coeff[3]+(-coeff[4])  #slope of individual regression for B =
2.002
>
> tapply(y, categ, mean) #these are what I want to get back out from the
> parameter estimates
> # in my world the parameter estimate should be cell mean(A) - grand mean =
> 60.02271 - 45.00677 = 15.01595
> # the estimate is 0.99688227
> #cell means   A        B
> #          60.0227  29.99082
>
> Any help is greatly appreciated.
>
> J
> --
> View this message in context:
http://www.nabble.com/Parameter-estimates-from-
> an-ANCOVA-tp20001809p20001809.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help