[R] How to make 'step' faster?

sambit rath kafkasbane at gmail.com
Fri Aug 20 11:34:31 CEST 2010


Thank you Gavin!
I am aware of the lasso regularization routine. But, in this case, my
brief was to perform a stepwise AIC procedure. I guess, subsetting the
data and cross validating it over the rest of the data is the only
answer.

sambit

On 20 August 2010 14:43, Gavin Simpson <gavin.simpson at ucl.ac.uk> wrote:
> On Fri, 2010-08-20 at 12:25 +0530, sambit rath wrote:
>> Dear all,
>>
>> I am fairly new to R. I would like to perform a step-wise logit
>> regression aiming to select a model on the basis of AIC. I am using
>> some large datasets (up to a million rows and 97 variables). It is
>> taking the 'step' function just too long to  complete a single
>> routine. Now, I have tried subsetting the data and perform the same
>> thing. But, 'step' is time consuming still.
>> Can there be a way out?
>
> Rethink your model selection procedure. Look at ridge regression and the
> lasso and elastic net procedures (See the Machine Learning task view on
> CRAN: http://CRAN.R-project.org/view=MachineLearning )
>
> Do you need all million rows? What do they gain you over using a
> smaller, randomly selected subset? You model fitted to the subset can be
> confirmed against the cases omitted from fitting.
>
>> Also, the datasets I am working with contain very few non-zero
>> entries. Can a sparse function specification be used on step?
>
> I don't think this is possible at the moment in R, but several people,
> including Doug Bates and Martin Maechler, are working on bringing sparse
> model matrices and fitting code into R. Doug and Martin's efforts are in
> the unreleased MatrixModels package on R-Forge:
>
> https://r-forge.r-project.org/R/?group_id=61
>
> but it is in active development at a beta stage and doesn't contain any
> stepwise procedures either. The latter isn't a problem as you probably
> want to use a shrinkage method as mentioned above...
>
> HTH
>
> G
>
>>
>> Thank you.
>>
>> Sambit
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
>  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
>  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
>  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
>  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
>
>



More information about the R-help mailing list