[R] Help with categorical predicrots in regression models

David Winsemius dwinsemius at comcast.net
Sat Jun 20 07:05:03 CEST 2015


On Jun 19, 2015, at 2:32 PM, Pamela Foggia wrote:

> Hello,
> In my regression models (linear and logistic models) I have two predictor
> variables, both are categorical variables: DEGREE and REGION.
> 
> DEGREE is for educational level, that is an ordinal variable with five
> levels (0-LT HIGH SCHOOL, 1-HIGH SCHOOL, 2-JUNIOR COLLEGE, 3-BACHELOR,
> 4-GRADUATE).
> 
> REGION is for the region of the respondent, that is a nominal variable with
> 9 levels (1-NEW ENGLAND, 2-MIDDLE ATLANTIC, 3-E. NOR. CENTRAL, 4-W. NOR.
> CENTRAL, 5-SOUTH ATLANTIC, 6-E. SOU. CENTRAL, 7-W. SOU. CENTRAL, 8-
> MOUNTAIN, 9-PACIFIC).
> 
> In many examples I read that, in order to use correctly these predictors as
> categorical variables, I have to use before the FACTOR function,

Please do _not_ capitalize the `factor` function name. R is _not_ SAS.

> for
> example in this way
> 
> fit1 <- lm(Z ~ factor(X) + factor(Y))
> fit2 <- glm(W ~ factor(x) + factor(Y), family=binomial(link="logit"))
> 
> obtaining the following output for the logistic regression
> 
>                               coef.est coef.se
> (Intercept)                 1.027    0.263
> factor(DEGREE)1         0.301    0.134
> factor(DEGREE)2         0.340    0.211
> factor(DEGREE)3         0.748    0.168
> factor(DEGREE)4         1.267    0.237
> ...
> 
> where clearly Z is a continuous variable and W is a binary variable. My

> question is: as far as the ordinal variable X is concerned, would it be
> more correct to use the ORDERED function rather than FACTOR?

That really would depend on the hypotheses under consideration, wouldn't it? 

> I mean an
> operation like this
> 
> fit1 <- lm(Z ~ ordered(X) + factor(Y))
> fit2 <- glm(W ~ ordered(x) + factor(Y), family=binomial(link="logit"))
> 
> where I obtain a different output like this
> 
>                                    coef.est coef.se
> (Intercept)                      1.558    0.241
> ordered(DEGREE).L           0.942    0.157
> ordered(DEGREE).Q          0.215    0.160
> ordered(DEGREE).C          0.118    0.111
> ordered(DEGREE)^4        -0.106    0.143
> ...

Clearly that output does not match the regression call.

> 
> What do the letters L, Q, C and the power ^4 (which I find in the output)
> mean?

The default set of contrasts for an ordered factor are the orthogonal polynomial contrasts of degree (nothing to do with your factor name) n-1 where there are n levels to the factor.

If this doesn't make sense, then you need to do further research to improve your understanding of polynomial contrasts. (They are messy.) You can limit the contrasts to only a linear "degree". You can find further information regarding polynomial contrasts at ?contr.poly and ?C

It's possible that this will be helpful:

DEGREE <- C( DEGREE, poly, 1)  # Only linear contrast
fit2 <- glm(W ~ DEGREE + factor(REGION), family=binomial(link="logit"))

And refrain from then using `ordered` in the formula.

-- 
David.
> 
> Thanks in advance
> 
> 	[[alternative HTML version deleted]]

This is a mailing list that request plain text. It's not hard to do in gmail.

Please read ....

> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list