[Rd] contr.sum() and contrast names
John Fox
jfox at mcmaster.ca
Sat Oct 27 16:44:39 CEST 2012
Hi Milan,
Take a look at the contr.Sum() and contr.Treatment() functions in the car package.
(I recall, BTW, the sometimes acrimonious previous discussion of this issue.)
Best,
John
------------------------------------------------
John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/
On Sat, 27 Oct 2012 13:39:06 +0200
Milan Bouchet-Valat <nalimilan at club.fr> wrote:
> Hi!
>
> I would like to suggest to make it possible, in one way or another, to
> get meaningful contrast names when using contr.sum(). Currently, when
> using contr.treatment(), one gets factor levels as contrast names; but
> when using contr.sum(), contrasts are merely numbered, which is not
> practical and can lead to mistakes (see code at the end of this
> message).
>
> This issue was discussed quickly in 2005 by Brian Ripley in a reply to a
> message on R-help [1]. He rightly stressed that treatment and sum
> contrasts are not equivalent to levels of a factor, because one needs to
> know the reference (here, level or sum) to interpret them. But when one
> knows the type of contrasts that are being used, useful labels are still
> of high value. I don't think anybody does serious work with sum
> contrasts named myfactor1, myfactor2, myfactor3. (This reasoning does
> not so much apply to contr.helmert() since ordered factors can quite
> naturally be reported using numbers.)
>
> Thus, would it be possible to add an option to contr.sum() so that it
> returns a matrix whose column names are the levels of the input factor?
> Such an option could also be added to other contrasts with default to
> FALSE. Another solution, which could be even more practical, would be to
> add a new function, called for example contr.sum2(), which would do the
> same thing - after all, we already have contr.SAS() to implement a
> slightly different behavior while being essentially the same as
> contr.treatment().
>
> This contr.sum() issue really sounds like a detail, but it's sad one
> given that factors work really great in R in all other situations. The
> only reason I can think of to explain this behavior is that people
> rarely use it. When fitting log-linear models with glm(), for example,
> this contrast is the most natural one, but currently gives poorly named
> coefficients when everything could be so easy to interpret if factor
> levels were used. This means people have to implement a replacement for
> contr.sum() by hand, which is not the end of the world but is definitely
> not optimal given how simple the solution is.
>
> Thanks for your attention!
>
>
> Illustration of the current difference between contr.sum() and
> contr.treatment():
>
> > z <- factor(LETTERS[1:3])
> > contr.treatment(z)
> B C
> A 0 0
> B 1 0
> C 0 1
> > contr.sum(z)
> [,1] [,2]
> A 1 0
> B 0 1
> C -1 -1
>
> 1: https://stat.ethz.ch/pipermail/r-help/2005-July/075430.html
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list