[R] Fwd: regsubsets (Leaps)

Peter Ehlers ehlers at ucalgary.ca
Sat Jun 2 01:03:45 CEST 2012


(Dear farmedgirl,

I agree with Bert's assessment: you're just plain out of luck
until you either get more (a lot more) data or reduce the number of
candidate predictors on some scientifically reasonable basis.

But here are a couple of additional comments:
1. with regsubsets()'s default of nvmax=8, you're looking for models
with up to 8 predictors. That's around 3e14 (or 300 trillion, in
North America) models that regsubsets() has to check! I would guess
that regsubsets() will return control to you much more quickly
if you set nvmax to, say, 2.

2. Suppose that you get a set of models. Now what? All you have
done to this point is exploratory data analysis (aka data snooping).
That's not necessarily a bad thing, but you can't claim any
significance (p-values, say). The only way to assess your models
will be with new data.

It's really too bad, but even in statistics, there's just no free
lunch.

Peter Ehlers

On 2012-06-01 11:22, Bert Gunter wrote:
> Frank -- where are you?!
>
> (To the OP: Your post leaves me simply breathless. You are embarked on
> a fool's errand. Filoche's "help" will continue you down that path.
> IMHO only of course.
>
> Bottom line: You CANNOT do what you wish to do. Or to quote John Tukey:
>
> "The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data. " )
>
> -- Bert
>
>
> ---------- Forwarded message ----------
> From: farmedgirl<ksteinmann at cdpr.ca.gov>
> Date: Fri, Jun 1, 2012 at 8:19 AM
> Subject: [R] regsubsets (Leaps)
> To: r-help at r-project.org
>
>
> Hi
> i need to create a model from 250 + variables with high collinearity, and
> only 17 data points (p = 250, n = 750). I would prefer to use Cp, AIC,
> and/or BIC to narrow down the number of variables, and then use VIF to
> choose a model without collinearity (if possible).  I realize that having a
> huge p and small n is going to give me extreme linear dependency problems,
> but I *think* these model selection criteria should still be useful?
>
> I have currently been running regsubsets for over a week with no results. I
> have no idea if R is still working, or if the computer is hung. I ran
> regsubsets on a smaller portion of the data, also with linear dependency
> problems, and got results. However, the hourglass continues its endless
> spiraling with the full dataset.
>
> I am running the following on Windows 7
> library(leaps)
> m_250<-regsubsets(Y~., data=model2, nbest=1, really.big=TRUE)
>
> (NOTE: The ~ is a tilda, not a dash, in the regression statement above: Y~.)
>
> Does anyone have any opinions on:
> 1) is R likely to still be running, even after a week, or should i just shut
> it down?
>
> 2) am i doing something wrong with regsubsets?
>
> 3) is there a better option than regsubsets, that will still allow me to
> narrow down parameters so i have explanatory power (ie i could develop a
> model using PLS, and keep all the variables, but also keep all the
> collinearity issues, and have good prediction but not explanatory power)
>
> 4) any other ideas?
>
> I am pretty new to R, so any newbie detail would be much appreciated!
>
> thanks in advance for any help!
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/regsubsets-Leaps-tp4632083.html
> Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list