[R] Appropriate regression model for categorical variables

Robert A LaBudde ral at lcfltd.com
Tue Jun 12 20:08:37 CEST 2007


At 01:45 PM 6/12/2007, Tirtha wrote:
>Dear users,
>In my psychometric test i have applied logistic regression on my data. My
>data consists of 50 predictors (22 continuous and 28 categorical) plus a
>binary response.
>
>Using glm(), stepAIC() i didn't get satisfactory result as misclassification
>rate is too high. I think categorical variables are responsible for this
>debacle. Some of them have more than 6 level (one has 10 level).
>
>Please suggest some better regression model for this situation. If possible
>you can suggest some article.

1. Using if a factor has many levels, there is a natural order to the 
levels. If so, consider fitting the factor as an ordered factor.

2. Break the factor levels into 2 or 3 groups that have some rational 
connection. Then fit the factor with a smaller number of levels. 
E.g., "race" might have levels "white", "black", "asian", "pacific", 
"Spanish surname", "other". Consider a change to "white", "nonwhite".

================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: ral at lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"



More information about the R-help mailing list