[R] Why the order of parameters in a logistic regression affects results significantly?

Greg Snow 538280 at gmail.com
Fri Jul 22 16:50:11 CEST 2016


Please post in plain text, the message is very hard to read with the
reformatting that was done.

Did you receive any warnings when you fit your models?

The fact that the last coefficient is NA in both outputs suggests that
there was some co-linearity in your predictor variables and R chose to
drop one of the offending variables from the model (the last one in
each case).  Depending on the nature of the co-linearity, the
interpretation (and therefore the estimates) can change.

For example lets say that you have 3 predictors, red, green, and blue
that are indicator variables (0/1) and that every subject has a 1 in
exactly one of those variables (so they are co-linear with the
intercept).  If you put the 3 variables into a model with the
intercept in the above order, then R will drop the blue variable and
the interpretation of the coefficients is that the intercept is the
average for blue subjects and the other coefficients are the
differences between red/green and blue on average.  If you refit the
model with the order blue, green, red, then R will drop red from the
model and now the interpretation is that the intercept is the mean for
red subjects and the others are the differences from red on average, a
very different interpretation and therefore different estimates.

I expect something along those lines is going on here.

On Thu, Jul 21, 2016 at 4:04 PM, Qinghua He via R-help
<r-help at r-project.org> wrote:
> Using the same data, if I ran
> fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
> I obtained:
>> exp(coef(fit2))(Intercept)         Age        LumA        LumB        HER2       Basal      Normal  0.24866935  1.00433781  0.10639937  0.31614001  0.08220685 20.25180956          NA
> while if I ran
>
> fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
> I obtained:
>> exp(coef(fit2)) (Intercept)          Age         LumA         LumB        Basal       Normal         HER2   0.02044232   1.00433781   1.29428846   3.84566516 246.35185956  12.16443690           NA
>
> Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain?
> For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept?
> Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1).
> Thanks!
> Peter
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com



More information about the R-help mailing list