[R] How to make 'step' faster?

Gavin Simpson gavin.simpson at ucl.ac.uk
Fri Aug 20 11:13:38 CEST 2010


On Fri, 2010-08-20 at 12:25 +0530, sambit rath wrote:
> Dear all,
> 
> I am fairly new to R. I would like to perform a step-wise logit
> regression aiming to select a model on the basis of AIC. I am using
> some large datasets (up to a million rows and 97 variables). It is
> taking the 'step' function just too long to  complete a single
> routine. Now, I have tried subsetting the data and perform the same
> thing. But, 'step' is time consuming still.
> Can there be a way out?

Rethink your model selection procedure. Look at ridge regression and the
lasso and elastic net procedures (See the Machine Learning task view on
CRAN: http://CRAN.R-project.org/view=MachineLearning )

Do you need all million rows? What do they gain you over using a
smaller, randomly selected subset? You model fitted to the subset can be
confirmed against the cases omitted from fitting.

> Also, the datasets I am working with contain very few non-zero
> entries. Can a sparse function specification be used on step?

I don't think this is possible at the moment in R, but several people,
including Doug Bates and Martin Maechler, are working on bringing sparse
model matrices and fitting code into R. Doug and Martin's efforts are in
the unreleased MatrixModels package on R-Forge:

https://r-forge.r-project.org/R/?group_id=61

but it is in active development at a beta stage and doesn't contain any
stepwise procedures either. The latter isn't a problem as you probably
want to use a shrinkage method as mentioned above...

HTH

G

> 
> Thank you.
> 
> Sambit
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
 Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list