[R] glmnet inclusion / exclusion of categorical variables

Steve Lianoglou lianoglou.steve at gene.com
Fri Aug 9 21:02:19 CEST 2013


Hi,

On Fri, Aug 9, 2013 at 6:44 AM, Kevin Shaney <kevin.shaney at rosetta.com> wrote:
>
> Hello -
>
> I have been using GLMNET of the following form to predict multinomial logistic / class dependent variables:
>
> mglmnet=glmnet(xxb,yb ,alpha=ty,dfmax=dfm,
> family="multinomial",standardize=FALSE)
>
> I am using both continuous and categorical variables as predictors, and am using sparse.model.matrix to code my x's into a matrix.  This is changing an example categorical variable whose original name / values is {V1 = "1" or "2" or "3"} into two recoded variables {V12= "1" or "0" and V13 = "1" or "0"}.
>
> As i am cycling through different penalties, i would like to either have both recoded variables included or both excluded, but not one included - and
> can't figure out how to make that work.   I tried changing the
> "type.multinomial" option, as that looks like this option should do what i want, but can't get it to work (maybe the difference in recoded variable names is driving this).
>
> To summarize, for categorical variables, i would like to hierarchically constrain inclusion / exclusion of recoded variables in the model - either all of the recoded variables from the same original categorical  variable are in, or all are out.

Pretty sure that you'll need the "grouped lasso" for that. Quick
googling over CRAN suggests:

grplasso: http://cran.r-project.org/web/packages/grplasso/index.html
standGL: http://cran.r-project.org/web/packages/standGL/index.html
gglasso: http://code.google.com/p/gglasso/

Unfortunately it doesn't look like any of them support the equivalent
of family="multinomial", only 2-class classification.

HTH,
-steve

-- 
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the R-help mailing list