[R] Preparing dataset for glmnet: factors to dummies

Nick Sabbe nick.sabbe at ugent.be
Tue Feb 1 10:46:01 CET 2011


Hello list.

For some reason, the makers of glmnet do not accept a dataframe as input.
They expect the input to be a matrix, where the dummies are already
precoded.
Now I have created a sample dataset with
. 11 factor columns with two levels
. 4 factor columns with three levels
. 135 continuous columns (from a standard normal)
. 100 observations (rows)
Say this dataframe is in dfrPredictors.

What I do now, is use the following code:

form<-paste("~",paste(colnames(dfrPredictors), collapse="+"), sep="")
dfrTmp<-model.frame(dfrPredictors, na.action=na.pass)
result<- as.matrix(model.matrix(as.formula(form), data=dfrTmp))[,-1]

This works (although admittedly, I don't understand everything of it).
However, I notice that for this rather limited dataset, this conversion
takes around 0.1 seconds user/elapsed time (on a relatively speedy laptop).

For my current work, I need to do this a lot of times on very similar
dataframes (in fact, they are multiply imputed from the same 'original'
dataframe), so I need all the speed I can get.
Does anybody know of a way that is quicker than the above? Note: because of
other uses of the dataframe, I don't have the option to do this conversion
before the imputation, so I really need the conversion itself to work
quickly.

Thanks,


Nick Sabbe
--
ping: nick.sabbe at ugent.be
link: http://biomath.ugent.be
wink: A1.056, Coupure Links 653, 9000 Gent
ring: 09/264.59.36

-- Do Not Disapprove



More information about the R-help mailing list