[R] Understanding linear contrasts in Anova using R

David Howell David.Howell at uvm.edu
Wed Sep 29 21:42:03 CEST 2010


  #I am trying to understand how R fits models for contrasts in a
#simple one-way anova. This is an example, I am not stupid enough to want
#to simultaneously apply all of these contrasts to real data. With a few
#exceptions, the tests that I would compute by hand (or by other software)
#will give the same t or F statistics. It is the contrast estimates that 
R produces
#that I can't seem to understand.
#
# In searching for answers to this problem, I found a great PowerPoint 
slide (I think by John Fox).
# The slide pointed to the coefficients, said something like "these are 
coeff. that no one could love," and
#then suggested looking at the means to understand where they came from. 
I have stared
# and stared at his means and then my means, but can't find a relationship.

# The following code and output illustrates the problem.

# Various examples of Anova using R

dv <- c(1.28,  1.35,  3.31,  3.06,  2.59,  3.25,  2.98,  1.53, -2.68,  
2.64,  1.26,  1.06,
        -1.18,  0.15,  1.36,  2.61,  0.66,  1.32,  0.73, -1.06,  0.24,  
0.27,  0.72,  2.28,
        -0.41, -1.25, -1.33, -0.47, -0.60, -1.72, -1.74, -0.77, -0.41, 
-1.20, -0.31, -0.74,
        -0.45,  0.54, -0.98,  1.68,  2.25, -0.19, -0.90,  0.78,  0.05,  
2.69,  0.15,  0.91,
         2.01,  0.40,  2.34, -1.80,  5.00,  2.27,  6.47,  2.94,  0.47,  
3.22,  0.01, -0.66)

group <- factor(rep(1:5, each = 12))


# Use treatment contrasts to compare each group to the first group.
options(contrasts = c("contr.treatment","contr.poly"))  # The default
model2 <- lm(dv ~ group)
summary(model2)
   # Summary table is the same--as it should be
   # Intercept is Group 1 mean and other coeff. are deviations from that.
   # This is what I would expect.
   #summary(model1)
   #              Df Sum Sq Mean Sq F value    Pr(>F)
   #  group        4  62.46 15.6151  6.9005 0.0001415 ***
   #  Residuals   55 124.46  2.2629
   #Coefficients:
   #            Estimate Std. Error t value Pr(>|t|)
   #(Intercept)  1.80250    0.43425   4.151 0.000116 ***
   #group2      -1.12750    0.61412  -1.836 0.071772 .
   #group3      -2.71500    0.61412  -4.421 4.67e-05 ***
   #group4      -1.25833    0.61412  -2.049 0.045245 *
   #group5       0.08667    0.61412   0.141 0.888288


# Use sum contrasts to compare each group against grand mean.
options(contrasts = c("contr.sum","contr.poly"))
model3 <- lm(dv ~ group)
summary(model3)

   # Again, this is as expected. Intercept is grand mean and others are 
deviatoions from that.
   #Coefficients:
   #              Estimate Std. Error t value Pr(>|t|)
   #  (Intercept)   0.7997     0.1942   4.118 0.000130 ***
   #  group1        1.0028     0.3884   2.582 0.012519 *
   #  group2       -0.1247     0.3884  -0.321 0.749449
   #  group3       -1.7122     0.3884  -4.408 4.88e-05 ***
   #  group4       -0.2555     0.3884  -0.658 0.513399

#SO FAR, SO GOOD

# IF I wanted polynomial contrasts BY HAND I would use
#    a(i) =  -2   -1   0   1   2   for linear contrast        (or some 
linear function of this )
#    Effect = Sum(a(j)M(i))    # where M = mean
#    Effect(linear) = -2(1.805) -1(0.675) +0(-.912) +1(.544) +2(1.889) = 
0.043
#    SS(linear) = n*(Effect(linear)^2)/Sum((a(j)^2))  = 12(.043)/10 = .002
#    F(linear) = SS(linear)/MS(error) = .002/2.263 = .001
#    t(linear) = sqrt(.001) = .031

# To do this in R I would use
order.group <- ordered(group)
model4 <- lm(dv~order.group)
summary(model4)
#  This gives:
     #Coefficients:
#                  Estimate Std. Error t value Pr(>|t|)
#    (Intercept)    0.79967    0.19420   4.118 0.000130 ***
#    order.group.L  0.01344    0.43425   0.031 0.975422
#    order.group.Q  2.13519    0.43425   4.917 8.32e-06 ***
#    order.group.C  0.11015    0.43425   0.254 0.800703
#    order.group^4 -0.79602    0.43425  -1.833 0.072202 .

# The t value for linear is same as I got (as are others) but I don't 
understand
# the estimates. The intercept is the grand mean, but I don't see the 
relationship
# of other estimates to that or to the ones I get by hand.
# My estimates are the sum of (coeff times means) i.e.  0 (intercept), 
.0425, 7.989, .3483, -6.66
# and these are not a linear (or other nice pretty) function of est. from R.

# Just as a check, I forced intercept through zero with with deviation 
scores or -1 in model.
#  Now force intercept to 0 by using deviation scores

devdv <- dv-mean(dv)
model5 <- lm(devdv~order.group)
summary(model5)
#Same as above except intercept = 0

# Now do it by removing the intercept
model6 <- lm(dv~order.group -1)
summary(model6)
#  Weird because coeff = cell means
     #             Estimate Std. Error t value Pr(>|t|)
#    order.group1   1.8025     0.4342   4.151 0.000116 ***
#    order.group2   0.6750     0.4342   1.554 0.125824
#    order.group3  -0.9125     0.4342  -2.101 0.040208 *
#    order.group4   0.5442     0.4342   1.253 0.215464
#    order.group5   1.8892     0.4342   4.350 5.94e-05 ***

# BUT note that row labels would suggest that we were not solving for 
polynomials,
# but the contrast matrix is made up of polynomial coefficients.

# contrasts(order.group)
#                .L         .Q            .C         ^4
#[1,] -6.324555e-01  0.5345225 -3.162278e-01  0.1195229
#[2,] -3.162278e-01 -0.2672612  6.324555e-01 -0.4780914
#[3,] -3.287978e-17 -0.5345225  1.595204e-16  0.7171372
#[4,]  3.162278e-01 -0.2672612 -6.324555e-01 -0.4780914
#[5,]  6.324555e-01  0.5345225  3.162278e-01  0.1195229

#So, when I use polynomials (either through contr.poly or by supplying a 
matrix of
#coefficients, what do the estimates represent?

# My problem is not restricted to polynomials. If I try a set of 
orthogonal linear contrasts
# on group means I get
#contrasts(group) <- cbind(c(1,1,0,-1,-1), c(1,-1,0,0,0), 
c(-0,0,0,1,-1), c(1,1,4,-1,1))
#model3 <- lm(dv ~ group)
#summary(model3)
#Coefficients:
#            Estimate Std. Error t value Pr(>|t|)
#(Intercept)   1.5335     0.2558   5.995 1.64e-07 ***
#group1        0.3168     0.2279   1.390 0.170185
#group2        0.5638     0.3071   1.836 0.071772 .
#group3       -1.2840     0.3369  -3.811 0.000352 ***
#group4       -0.6115     0.1387  -4.408 4.88e-05 ***
#These are not even close to what I would expect. By hand I would 
compute the contrasts as
# .0442, 1.1275, 1.3450, and 8.5608 with different t values.

# Any help would be appreciated.

Thanks,
Dave Howell



More information about the R-help mailing list