[R] lm model with many categorical variables

Bert Gunter bgunter.4567 at gmail.com
Tue Sep 20 16:49:02 CEST 2016


You need statistical help, which is generally off topic here. I
suggest you post to a statistcal site like stats.stackexchange.com
instead. Better yet, find a local statistical expert with whom you can
consult.

Cheers,
Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Sep 20, 2016 at 1:34 AM, Michael Haenlein
<haenlein at escpeurope.eu> wrote:
> Dear all,
>
> I am trying to estimate a lm model with one continuous dependent variable
> and 11 independent variables that are all categorical, some of which have
> many categories (several dozens in some cases).
>
> I am not interested in statistical inference to a larger population. The
> objective of my model is to find a way to best predict my continuous
> variable within the sample.
>
> When I run the lm model I evidently get many regression coefficients that
> are not significant. Is there some way to automatically combine levels of a
> categorical variable together if the regression coefficients for the
> individual levels are not significant?
>
> My idea is to find some form of grouping of the different categories that
> allows me to work with less levels while keeping or even improving the
> quality of predictions.
>
> Thanks,
>
> Michael
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list