[R] Why the order of parameters in a logistic regression affects results significantly?

David Winsemius dwinsemius at comcast.net
Fri Jul 22 19:24:48 CEST 2016


> On Jul 21, 2016, at 3:04 PM, Qinghua He via R-help <r-help at r-project.org> wrote:
> 
> Using the same data, if I ran
> fit2 <-glm(formula=AR~Age+LumA+LumB+HER2+Basal+Normal,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
> I obtained:

exp(coef(fit2))(Intercept)         Age        LumA        LumB        HER2       Basal      Normal

                0.24866935  1.00433781  0.10639937  0.31614001  0.08220685 20.25180956          NA 

> while if I ran
> 
> fit2 <-glm(formula=AR~Age+LumA+LumB+Basal+Normal+HER2,family=binomial,data=RacComp1)summary(fit2)exp(coef(fit2))
> I obtained:

exp(coef(fit2)) (Intercept)          Age         LumA         LumB        Basal       Normal        HER2

                 0.02044232   1.00433781   1.29428846   3.84566516 246.35185956  12.16443690           NA 

> 
> Essentially they're the same model - I just moved HER2 to the last. But the OR changed significantly. Can someone explain?

You have collinearity and one of your variables will be dropped as redundant. Which one is dropped is determined by the order of the variable names in the model formula.


> For the latter result, I don't even know how to interpret as all factors have OR>1 (except Intercept), how could that possible? Can I eliminate the effect of intercept?

In the first model (with the defaults of  treatment contrasts) the Intercept is actually an estimate for cases with LumA, LumB,Basal,Her2 all at their lowest level and this not coincidentally also precisely defines your Normal variable. They all (excepting Normal) have adverse impact in your study of AR whatever it might be. If these various categories (which I suspect are breast cancer risk predictors) are all distinct with no overlaps, then use this:

fit2 <-glm(formula=AR~Age+ Normal+ LumA+LumB+HER2+Basal+ 0,family=binomial,data=RacComp1)

The results will probably be the same as your first model except that Intercept's parameter will now be the parameter for Normal.


> Also, I cannot obtain OR for the last factor due to collinearity. However, I know others obtained OR for all factors for the same dataset. Can someone tell me how to obtain OR for all factors? All factors are categorical variables (i.e., 0 or 1).
> Thanks!
> Peter
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list