[R] Contrasts for ordered factors

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Mon Jan 8 12:08:00 CET 2007


lorenz.gygax at art.admin.ch wrote:
> Dear all,
>
> I do not seem to grasp how contrasts are set for ordered factors. Perhaps someone can elighten me?
>
> When I work with ordered factors, I would often like to be able to reduce the used polynomial to a simpler one (where possible). Thus, I would like to explicetly code the polynomial but ideally, the intial model (thus, the full polynomial) would be identical to one with an ordered factor.
>
> Here is a toy example with an explanatory variable (EV) with three distinct values (1 to 3) and a continuous response variable (RV):
>
> options (contrasts= c ('contr.treatment', 'contr.poly'))
> example.df <- data.frame (EV= rep (1:3, 5))
> set.seed (298)
> example.df$RV <- 2 * example.df$EV + rnorm (15)
>
> I evaluate this data using either an ordered factor or a polynomial with a linear and a quadratic term:
>
> lm.ord <- lm (RV ~ ordered (EV), example.df)
> lm.pol <- lm (RV ~ EV + I(EV^2), example.df)
>
> I then see that the estimated coefficients differ (and in other examples that I have come across, it is often even more extreme):
>
> coef (lm.ord)
> (Intercept) ordered(EV).L ordered(EV).Q 
>   3.9497767     2.9740535    -0.1580798 
> coef (lm.pol)
> (Intercept)            EV       I(EV^2) 
>  -0.9015283     2.8774032    -0.1936074 
>
> but the predictions are the same (except for some rounding):
>
> table (round (predict (lm.ord), 6) == round (predict (lm.pol), 6))
> TRUE 
>   15 
>
> I thus conclude that the two models are the same and are just using a different parametrisation. I can easily interprete the parameters of the explicit polynomial but I started to wonder about the parametrisation of the ordered factor. In search of an answer, I did check the contrasts:
>
> contr.poly (levels (ordered (example.df$EV)))
>                 .L         .Q
> [1,] -7.071068e-01  0.4082483
> [2,] -9.073264e-17 -0.8164966
> [3,]  7.071068e-01  0.4082483
>
> The linear part basically seems to be -0.707, 0 (apart for numerical rounding) and 0.707. I can understand that any even-spaced parametrisation is possible for the linear part. But does someone know where the value of 0.707 comes from (it seems to be the square-root of 0.5, but why?) and why the middle term is not exactly 0?
>
> I do not understand the quadratic part at all. Wouldn't that need the be the linear part to the power of 2?
>
>   
These are orthogonal polynomials. 

To see the main point, try

> M <- cbind(1,contr.poly (3))

> M

                  .L         .Q

[1,] 1 -7.071068e-01  0.4082483

[2,] 1 -7.850462e-17 -0.8164966

[3,] 1  7.071068e-01  0.4082483

> zapsmall(crossprod(M))

     .L .Q

   3  0  0

.L 0  1  0

.Q 0  0  1


This parametrization has better numerical properties than the
straightforward 1,x,x^2,... , especially in balanced designs.

(SOAPBOX: Some, including me, feel that  having polynomials as default
contrasts for ordered factors is a bit of a design misfeature - It was
inherited from S, but assigning equidistant numerical values to ordered
groups isn't really well-founded, and does become plainly wrong when the
levels are really something like 0, 3, 6, 12, 18, 24 months.)
 

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list