[R] analysis of covariance and constrained parameters

Steven Orzack orzack at freshpond.org
Fri Feb 16 22:14:51 CET 2018

Consider an analysis of covariance involving age and cohort. The goal is 
to assess whether the influence of cohort

depends upon the age. The simplest case involves data as follows

value Age Cohort

x1       1       3

x2       1       4

x3       1       5

x4       2       3

x5       2       4

x6       2       5


Age is a factor. The numeric response variable is value and Cohort is a 
numeric predictor. So, (pseudo-code) commands to

estimate the age=specific relationship between value and Cohort could be

glm(value ~ Age/Cohort -  1, family =......, data = .....)

glm(value ~ Age/(Cohort + I(Cohort^2)) - 1, family =......, data = .....).

The latter commands would provide estimates of the age-specific 
intercept, linear, and quadratic coefficients, as in

value_Age1 <- intercept_Age1 + linear_Age1*Cohort + quad_Age1*Cohort^2

value_Age2 <- intercept_Age2 + linear_Age2*Cohort + quad_Age2*Cohort^2

This is standard. One would choose among the above models via analysis 
of variance or AIC.

Now assume that I have external knowledge that tells me that there is NO 
influence of Cohort on value for Age1 and that

there could be up to a quadratic influence for Age2. Accordingly, I 
would like to

fit a model which estimates these relationships:

value_Age1 <- intercept_Age1 (+ 0*Cohort + 0*Cohort^2) 
                             (which is, of course, value_Age1 <- 

value_Age2 <- intercept_Age2 + linear_Age2*Cohort + quad_Age2*Cohort^2

What is the glm syntax to fit this model? It is a model in which we have 
constraints that (two) coefficients for one level of the factor must 
have a particular value (0) and

there is no such constraint for the second level of the factor.

Please note that I understand that

glm(value ~ Age/(Cohort + I(Cohort^2)) - 1, family =......, data = .....).

generates point estimates of the linear and quadratic coefficients for 
Age1 (as above) and one could inspect them to determine whether they are 
statistically equivalent to 0.

However, I want to incorporate the knowledge that these coefficients 
MUST BE 0 into my hypothesis testing. Knowing that these coefficients 
are 0 could influence the results of

anova and AIC comparisons since it reduces the number of degrees of 
freedom associated with model.

Many thanks for suggestions in advance!

Steven Orzack
Fresh Pond Research Institute
173 Harvey Street
Cambridge, MA 02140
617 864-4307


More information about the R-help mailing list