[R] Parameter estimates from an ANCOVA

John Fox jfox at mcmaster.ca
Thu Oct 16 01:00:01 CEST 2008


Dear J.,

Perhaps the following will clarify what you got:

(1) For those unfamiliar with it, contr.Sum() is like contr.sum(), except
for (in my opinion) printing more informative coefficient labels. There is
no contr.Poly(), but since you didn't use it, there was no error.

(2) Here's what I got with your contrived data (I set random.seed(123) to
make the computation reproducible, but that doesn't matter much given the
R^2 near 1):

> summary(fit)

Call:
lm(formula = y ~ categ * cont)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.09106 -0.02992 -0.00602  0.02455  0.08593 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)      0.047043   0.051576   0.912    0.374    
categ[S.A]      -0.031164   0.051576  -0.604    0.553    
cont             2.997098   0.003364 890.816   <2e-16 ***
categ[S.A]:cont  1.002440   0.003364 297.951   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.0499 on 18 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1 
F-statistic: 9.572e+05 on 3 and 18 DF,  p-value: < 2.2e-16 

Given the sum-to-zero contrast and the interaction, the coefficient
categ[S.A] = -0.031164 is half the difference in the fitted value of y
between categ = A and categ = B, when cont = 0.  Since your cont ranges from
10 to 20, this is not a meaningful comparison. 

(3) Of course, in the presence of the interaction, the difference between
levels A and B of categ depends upon the value at which cont is set. You
could express cont as deviations from its grand mean, obtaining:

> cont.d <- cont - mean(cont)

> fit2 <- lm(y ~ categ*cont.d)

> summary(fit2)

Call:
lm(formula = y ~ categ * cont.d)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.09106 -0.02992 -0.00602  0.02455  0.08593 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       45.003515   0.010639  4229.9   <2e-16 ***
categ[S.A]        15.005441   0.010639  1410.4   <2e-16 ***
cont.d             2.997098   0.003364   890.8   <2e-16 ***
categ[S.A]:cont.d  1.002440   0.003364   298.0   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Residual standard error: 0.0499 on 18 degrees of freedom
Multiple R-squared:     1,      Adjusted R-squared:     1 
F-statistic: 9.572e+05 on 3 and 18 DF,  p-value: < 2.2e-16

Now categ[S.A] = 15.005441 represents half the difference in the fitted
value of y between categ = A and categ = B, when cont is equal to its mean.
Since the data are balanced, this is also the difference between the mean of
y at level A of categ minus the grand mean of y, which is what I think you
expected to see.

I hope this helps,
 John

------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
web: socserv.mcmaster.ca/jfox

> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
> Behalf Of jgar
> Sent: October-15-08 4:35 PM
> To: r-help at r-project.org
> Subject: [R] Parameter estimates from an ANCOVA
> 
> 
> Hi all,
> 
> This is probably going to come off as unnecessary (and show my ignorance)
> but I am trying to understand the parameter estimates I am getting from R
> when doing an ANCOVA.  Basically, I am accustomed to the estimate for the
> categorical variable being equivalent to the respective cell means minus
the
> grand mean.  I know is the case in JMP - all other estimates from these
data
> match the output from R, except I don't understand why the parameter
> estimate for the categorical factor differs.
> 
> library(car)
> options(contrasts=c("contr.Sum","contr.Poly"))
> 
> cont=as.numeric(rep(10:20, 2))  # fake data
> categ=as.factor(c(rep("A", 11), rep("B", 11)))
> y=(c(cont[1:11]*4 , cont[1:11]*2))+rnorm(mean=0, sd=.05, 22)
> 
> fit=lm(y~categ*cont)
> scatterplot(y~cont|categ)   # shows interaction
> scatterplot(y~categ|cont)   # shows effect of the categorical variable,
> irrespective of cont
> 
> summary(fit); anova(fit)
> 
> coeff=coef(summary(fit))[,1]  # column of estimates to "coeff"
> 
> grandmean=coeff[1]+ mean(cont)*coeff[3]  #GM = intercept +
> mean(cont)*regression_slope = 45.00677
> slope_A=coeff[3]+coeff[4]  #slope of individual regression for A  = 3.998
> slope_B=coeff[3]+(-coeff[4])  #slope of individual regression for B =
2.002
> 
> tapply(y, categ, mean) #these are what I want to get back out from the
> parameter estimates
> # in my world the parameter estimate should be cell mean(A) - grand mean =
> 60.02271 - 45.00677 = 15.01595
> # the estimate is 0.99688227
> #cell means   A        B
> #          60.0227  29.99082
> 
> Any help is greatly appreciated.
> 
> J
> --
> View this message in context:
http://www.nabble.com/Parameter-estimates-from-
> an-ANCOVA-tp20001809p20001809.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list