[R] How to test omitted level from a multiple level factor against overall mean in regression models?
rolf.turner at xtra.co.nz
Mon Mar 26 02:55:56 CEST 2012
The test you are requesting is ***MEANINGLESS***. The ``effect value''
of a single
level is ill-defined (or in the more usual parlance, "not estimable").
procedure suggested by Gabor gives you point estimates *subject to the
imposed by the contrasts used. The choice of contrasts is arbitrary,
essentially a matter
of aesthetics/taste/convenience. The values returned by dummy.coef()
have, in and
of themselves, no meaning at all.
You can meaningfully estimate, and test for the "significance" of,
between the "effect values" of factor levels. For the individual
levels, no can do.
E.g. Y = mu + alpha_i + E when the observation is at level i of the
factor (and "E"
means "random error". In this setting mu = 0, alpha_1 = 1, alpha_2 = 2
= 3 is ***EXACTLY THE SAME MODEL*** as mu = 1, alpha_1 = 0, alpha_2 = 1 and
alpha_3 = 2.
It makes no sense to ask (or to test) whether alpha_1 differs from 0.
On 26/03/12 02:08, "Biedermann, Jürgen" wrote:
> Hi Gabor,
> Thanks a lot for the answer.
> However, I'm not so much focusing on the pure effect value of the omitted factor level, but more on the statistical test if it
> differs significantly from 0.
> Do you know a way for this purpose too?
> Greetings Jürgen
> Von: Gabor Grothendieck [ggrothendieck at gmail.com]
> Gesendet: Sonntag, 25. März 2012 14:11
> An: Biedermann, Jürgen
> Cc: r-help at R-project.org
> Betreff: Re: [R] How to test omitted level from a multiple level factor against overall mean in regression models?
> 2012/3/25 "Biedermann, Jürgen"<Juergen.Biedermann at charite.de>:
>> Hi there,
>> I have a linear model with one factor having three levels.
>> I want to check if the different levels significantly differ from the overall mean (using contr.sum).
>> However one level (the last) is omitted in the standard procedure.
>> To illustrate this:
>> x<- as.factor(c(1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3))
>> y<- c(1.1,1.15,1.2,1.1,1.1,1.1,1.2,1.2,1.2,2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8,2.9,3,3.1)
>> test<- data.frame(x,y)
>> reg1<- lm(y~C(x,contr.sum),data=test)
>> Estimate Std. Error t value Pr(>|t|)
>> (Intercept) 1.63333 0.06577 24.834 8.48e-15 ***
>> C(x, contr.sum)1 -0.48333 0.10792 -4.479 0.00033 ***
>> C(x, contr.sum)2 -0.48333 0.08936 -5.409 4.70e-05 ***
>> Is it possible to get the effect for the third level (against the overall mean) in the table too.
>> I figured out:
>> reg2<- lm(y~C(relevel(x,3),contr.sum),data=test)
>> C(relevel(x, 3), contr.sum)1 0.96667 0.07951 12.158 8.24e-10 ***
>> C(relevel(x, 3), contr.sum)2 -0.48333 0.10792 -4.479 0.00033 ***
>> The first row now test the third level against the overall mean, but I find this approach not so convenient.
>> Moreover, I wonder if it is meaningful at all regarding the cumulation of alpha error. Would a Bonferroni correction be sensible?
> Try this:
>> options(contrasts = c("contr.sum", "contr.poly"))
>> reg1<- lm(y~x,data=test)
> Full coefficients are
> (Intercept): 1.633333
> x: 1 2 3
> -0.4833333 -0.4833333 0.9666667
> Statistics& Software Consulting
> GKX Group, GKX Associates Inc.
> tel: 1-877-GKX-GROUP
> email: ggrothendieck at gmail.com
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help