[R] summary(lm ... conrasts=...)

Tue Aug 22 18:50:16 CEST 2006

On Tue, 22 Aug 2006, Ted.Harding at nessie.mcc.ac.uk wrote:

> Hi Folks,
> 
> I've encountered something I hadn't been consciously
> aware of previously, and I'm wondering what the
> explanation might be.

Try

> contr.helmert(letters[1:3])
  [,1] [,2]
a   -1   -1
b    1   -1
c    0    2
> contr.treatment(letters[1:3])
  b c
a 0 0
b 1 0
c 0 1

and note the difference in column names.

Those who made the decision to use those column names determined this.
I agreed with them that labelling the second Helmert contrast here as 'c' 
would be confusing, especially easy to confuse with treatment contrasts.
However, I thought the treatment contrasts should be labelled b-a and c-a.
We also had arguments about xc vs x.c vs x:c.  AFAIR brevity won.

Once you know how it is done, it is easy to change the behaviour, of 
course: just roll your own contrasts function with the colnames you want.

> In (on another list) using R to demonstrate the difference
> between different contrasts in 'lm' I set up an example
> where Y is sampled from three different normal distributions
> according to the levels ("A","B","C") of a factor X:
> 
> Y<-c(rnorm(mean=0,n=12),rnorm(mean=2,n=12),rnorm(mean=4,n=12))
> X<-factor(c(rep("A",12),rep("B",12),rep("C",12)))
> 
> Then I do a summary(lm(Y~X)...) using first "Treatment" contrasts
> and then "Helmert" contrasts. Here are the coefficient parts
> of the results in each case:

Just coef() or print() gives you the coefficient names: this is not done 
by summary().

> summary(lm(Y~X,contrasts=list(X="contr.treatment")))
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   0.2303     0.3220   0.715  0.47944
> XB            1.3057     0.4554   2.867  0.00716 **
> XC            3.4204     0.4554   7.511 1.23e-08 ***
> 
> 
> summary(lm(Y~X,contrasts=list(X="contr.helmert")))
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   1.8057     0.1859   9.713 3.34e-11 ***
> X1            0.6529     0.2277   2.867  0.00716 **
> X2            0.9225     0.1315   7.017 5.00e-08 ***
> 
> 
> What I'm wondering is why the "effect names" are "X.B"
> and "X.C" for Treatment, and "X1", "X2" for Helmert.
> 
> Why not "X.B" and "X.C" in both cases? Just as "XB"
> contrasts B with the overall mean and "XC" contrasts C
> with the overall mean, "XA" being implicit, in the
> Treatment contrasts, so "X1" contrasts B with A and
> "X2" contrasts C with (A+B) in Helmert, so there
> is to my mind just as definite an association of "B"
> with the first contrast, and "C" with the second, in
> the Helmert case as in the Treatment case!
> 
> I know it's just a matter of "notation", but in the
> Helmert case the association with the names of the
> factor levels has been lost, and it could be useful
> to have it explicit. (Or is it intended simply as a
> reminder that one is using a particular system of
> contrasts?)
> 
> Thanks, and best wishes to all,
> Ted.
> 
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 22-Aug-06                                       Time: 14:45:17
> ------------------------------ XFMail ------------------------------

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595