[R] How to make 'step' faster?

Martin Maechler maechler at stat.math.ethz.ch
Fri Aug 20 16:16:20 CEST 2010


>>>>> "GS" == Gavin Simpson <gavin.simpson at ucl.ac.uk>
>>>>>     on Fri, 20 Aug 2010 10:13:38 +0100 writes:

    GS> On Fri, 2010-08-20 at 12:25 +0530, sambit rath wrote:
    >> Dear all,
    >> 
    >> I am fairly new to R. I would like to perform a step-wise logit
    >> regression aiming to select a model on the basis of AIC. I am using
    >> some large datasets (up to a million rows and 97 variables). It is
    >> taking the 'step' function just too long to  complete a single
    >> routine. Now, I have tried subsetting the data and perform the same
    >> thing. But, 'step' is time consuming still.
    >> Can there be a way out?

    GS> Rethink your model selection procedure. Look at ridge regression and the
    GS> lasso and elastic net procedures (See the Machine Learning task view on
    GS> CRAN: http://CRAN.R-project.org/view=MachineLearning )

    GS> Do you need all million rows? What do they gain you over using a
    GS> smaller, randomly selected subset? You model fitted to the subset can be
    GS> confirmed against the cases omitted from fitting.

    >> Also, the datasets I am working with contain very few non-zero
    >> entries. Can a sparse function specification be used on step?

    GS> I don't think this is possible at the moment in R, but several people,
    GS> including Doug Bates and Martin Maechler, are working on bringing sparse
    GS> model matrices and fitting code into R. Doug and Martin's efforts are in
    GS> the unreleased MatrixModels package

Thank you, Gavin, but note that
MatrixModels is also available from CRAN, i.e., via "your favorite"
package installation tool.

    GS>  on R-Forge:

    GS> https://r-forge.r-project.org/R/?group_id=61

    GS> but it is in active development at a beta stage and doesn't contain any
    GS> stepwise procedures either. The latter isn't a problem as you probably
    GS> want to use a shrinkage method as mentioned above...

Yes.
Also note that  'Hastie et al's  "glmnet" package (with it's
lasso implementations) can work well with the sparse model matrices
that the modelMatrix() function from the  "MatrixModels" package
returns.

Martin Maechler, ETH Zurich.



More information about the R-help mailing list