[R] variable/model selction (step/stepAIC) for biglm ?

Tal Galili tal.galili at gmail.com
Sat Feb 21 20:01:49 CET 2009


Hi Chuck,

Thanks for the guidelines.

I was hoping someone in the group already experienced handling this
type of task and have some handy code to share.
I'll wait another day or two to see if someone responds with any more
ideas or experience, and if nothing will come up, I might try my hand
in your suggestion

Cheers,
Tal






On Sat, Feb 21, 2009 at 8:09 PM, Charles C. Berry <cberry at tajo.ucsd.edu> wrote:
> On Sat, 21 Feb 2009, Tal Galili wrote:
>
>> Hello dear R mailing list members.
>>
>> I have recently became curious of the possibility applying model
>> selection algorithms (even as simple as AIC) to regressions of large
>> datasets.
>
>
> Large in the sense of many observations, one assumes.
>
> But how large in terms of the number of variables??
>
> If not too many variables, then you can form the regression sums of squares
> for all 2^p combinations of regressors from a biglm() fit of all variables
> as biglm provides coef() and vcov() methods.
>
> If it is large, then you most likely will need to do subsampling to reduce
> the number to 'not too many' via lm() and friends then and apply the above
> strategy.
>
> I searched as best as I could, but couldn't find any
>>
>> reference or wrapper for using step or stepAIC to packages such as
>> biglm.
>
>
> Surely any direct implementation of step() would be hopelessly long in
> execution time.
>
>
> HTH,
>
> Chuck
>
>
>>
>> Any ideas or directions of how to implement such a concept ?
>>
>>
>> Best,
>> Tal
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> --
>> ----------------------------------------------
>>
>>
>> My contact information:
>> Tal Galili
>> Phone number: 972-50-3373767
>> FaceBook: Tal Galili
>> My Blogs:
>> www.talgalili.com
>> www.biostatistics.co.il
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> Charles C. Berry                            (858) 534-2098
>                                            Dept of Family/Preventive
> Medicine
> E mailto:cberry at tajo.ucsd.edu               UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
>
>
>



-- 
----------------------------------------------


My contact information:
Tal Galili
Phone number: 972-50-3373767
FaceBook: Tal Galili
My Blogs:
www.talgalili.com
www.biostatistics.co.il




More information about the R-help mailing list