[R] Stepwise regression scope: all interacting terms (.^2)

Steve Lianoglou mailinglist.honeypot at gmail.com
Fri Nov 16 23:32:32 CET 2012


Hi Mark,

To put some context to David's response below, you can search the list
archives for times when people ask about stepwise regression. You can
get started here:

http://search.gmane.org/search.php?group=gmane.comp.lang.r.general&query=stepwise+penalized

The long and short of it is that you are almost always encouraged to
use some regularization/penalized model instead of this stepwise
approach. Frank Harrell, in particular, is generally quite vocal
against stepwise regression -- I'm actually surprised he hasn't chimed
in by now, but maybe he's getting a bit tired of fighting the good
fight -- or, it's close to the holiday and he's taking a break ;-)

Anyway ... HTH,

-steve

On Fri, Nov 16, 2012 at 4:13 PM, David Winsemius <dwinsemius at comcast.net> wrote:
>
> On Nov 16, 2012, at 12:16 PM, Mark Ebbert wrote:
>
>> I haven't heard anything on this question. Is there something fundamentally wrong with my question? Any feedback is appreciated.
>>
>
> Perhaps failure to read this sig at the bottom of every posted message to rhelp?
>
> "PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code."
>
>
>> Mark
>> On Nov 15, 2012, at 8:13 AM, Mark T. W. Ebbert wrote:
>>
>>> Dear Gurus,
>>>
>>> Thank you in advance for your assistance. I'm trying to understand scope better when performing stepwise regression using "step."
>
> From the help page of step:
> "If scope is a single formula, it specifies the upper component, and the lower model is empty. "
>
>>> I have a model with a binary response variable and 10 predictor variables. When I perform stepwise regression I define scope=.^2 to allow interactions between all terms.
>
> I generally avoid answering questions about stepwise regression, because most of them do not include sufficient background material to justify that strategy. Yours certainly did not.
>
>
>>> But I am missing something. When I perform stepwise regression (both directions) on the main model (y~x1+x2+…+x10) the method returns quickly with an answer; however, when I define all interactions in the main model (y~x1+x2+…+x10+x1:x2+x1:x3+…) and then perform stepwise regression (backward only) it runs so long I have to kill it.
>>>
>>> So here's my question: what is the difference between scope=.^2 on the additive (proper term?) model and defining all interactions and doing backward regression? My understanding is that .^2 is supposed to allow all interactions!
>
> Well, I would have guessed all two-way interactions (all 45  of them in your case) would be included and then successively reduce until you got to your specified (arbitrary and most likely incorrectly set) endpoint.) I think the help page Details section is unclear on this point. I do not think that the 120 potential three-way interactions are part of the scope in that instance, but it should be easy enough for you to test that possibility.
>
> --
> David Winsemius, MD
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact




More information about the R-help mailing list