[R] Appropriate regression model for categorical variables

(Ted Harding) ted.harding at nessie.mcc.ac.uk
Wed Jun 13 01:24:10 CEST 2007


On 12-Jun-07 17:45:44, Tirthadeep wrote:
> 
> Dear users,
> In my psychometric test i have applied logistic regression
> on my data. My data consists of 50 predictors (22 continuous
> and 28 categorical) plus a binary response. 
> 
> Using glm(), stepAIC() i didn't get satisfactory result as
> misclassification rate is too high. I think categorical
> variables are responsible for this debacle. Some of them have
> more than 6 level (one has 10 level).
> 
> Please suggest some better regression model for this situation.
> If possible you can suggest some article.

I hope you have a very large number of cases in your data!

The minimal complexity of the 28 categorical variables compatible
with your description is

  1 factor at 10 levels
  2 factors at 7 levels
 25 factors at 2 levels

which corresponds to (2^25)*(7^2)*10 = 16441671680 ~= 1.6e10
distinct possible combinations of levels of the factors. Your
true factors may have far more than this.

Unless you have more cases than this in your data, you are
likely to fall into what is called "linear separation", in which
the logistic regression will find a perfect predictor for your
binary outcome. This prefect predictor may well not be unique
(indeed if you have only a few hundred cases there will probably
be millions of them).

Therefore your logistic reggression is likely to be meaningless.

I can only suggest that you consider very closely how to

a) reduce the numbers of levels in some of your factors,
   by coalescing levels together;
b) defining new factors in terms of the old so as to reduce
   the total number of factors (which may include ignoring
   some factors altogether)

so that you end up with new categorical variables whose total
number of possible combinations is much smaller (say at most 1/5)
of the number of cases in your data.

In summary: you have to many explanatory variables.

Best wishes,
Ted.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 13-Jun-07                                       Time: 00:23:49
------------------------------ XFMail ------------------------------



More information about the R-help mailing list