[R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Fri Jan 27 14:50:14 CET 2012

-----Mensaje original-----
De: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] En nombre de Frank Harrell
Enviado el: viernes, 27 de enero de 2012 14:28
Para: r-help at r-project.org
Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 interactions and unique combinations?

Ruben, I'm not sure you are understanding the ramifications of what Bert said.  In addition you are making several assumptions implicitly:

--
Ruben: Frank, I guess we are going nowhere now.
But thanks anyways. See below if you want.

1. model selection is needed (vs. fitting the full model and using shrinkage)
Ruben: Nonlinear mechanistic models that are too complex often just don't converge, they crash. No shrinkage to apply to a failed convergence model.

2. model selection works in the absence of shrinkage 
Ruben: I think you are assuming that shrinkage is necessary.

3. model selection can find the "right" model and the features selected would be the same features selected if the data were slightly perturbed or a new sample taken 
Ruben: I don't make this assumption. New data, new model.

4. AIC tells you something that P-values don't (unless using structured multiple degree of freedom tests)
Ruben: It does.

 5. parsimony is good
Ruben: It is.

None of these assumptions is true.  Model selection without shrinkage
(penalization) seems to offer benefits but this is largely a mirage.

Ruben: Have a good weekend!

Ruben

Rubén Roa wrote
> 
> -----Mensaje original-----
> De: Bert Gunter [mailto:gunter.berton@] Enviado el: jueves, 26 de 
> enero de 2012 21:20
> Para: Rubén Roa
> CC: Ben Bolker; Frank Harrell
> Asunto: Re: [R] How do I compare 47 GLM models with 1 to 5 
> interactions and unique combinations?
> 
> On Wed, Jan 25, 2012 at 11:39 PM, Rubén Roa <rroa@> wrote:
>> I think we have gone through this before.
>> First, the destruction of all aspects of statistical inference is not 
>> at stake, Frank Harrell's post notwithstanding.
>> Second, checking all pairs is a way to see for _all pairs_ which 
>> model outcompetes which in terms of predictive ability by -2AIC or 
>> more. Just sorting them by the AIC does not give you that if no model 
>> is better than the next best by less than 2AIC.
>> Third, I was not implying that AIC differences play the role of 
>> significance tests. I agree with you that model selection is better 
>> not understood as a proxy or as a relative of significance tests procedures.
>> Incidentally, when comparing many models the AIC is often inconclusive.
>> If one is bent on selecting just _the model_ then I check numerical 
>> optimization diagnostics such as size of gradients, KKT criteria, and 
>> other issues such as standard errors of parameter estimates and the 
>> correlation matrix of parameter estimates.
> 
> -- And the mathematical basis for this claim is ....  ??
> 
> --
> Ruben: In my area of work (building/testing/applying mechanistic 
> nonlinear models of natural and economic phenomena) the issue of 
> numerical optimization is a very serious one. It is often the case 
> that a really good looking model does not converge properly (that's 
> why ADMB is so popular among us). So if the AIC is inconclusive, but 
> one AIC-tied model yields reasonably looking standard errors and low 
> pairwise parameter estimates correlations, while the other wasn´t even 
> able to produce a positive definite Hessian matrix (though it was able 
> to maximize the log-likelihood), I think I have good reasons to select 
> the properly converged model. I assume here that the lack of 
> convergence is a symptom of model inadequacy to the data, that the AIC was not able to detect.
> ---
> Ruben: For some reasons I don't find model averaging appealing. I 
> guess deep in my heart I expect more from my model than just the best 
> predictive ability.
> 
> -- This is a religious, not a scientific statement, and has no place 
> in any scientific discussion.
> 
> --
> Ruben: Seriously, there is a wide and open place in scientific 
> discussion for mechanistic model-building. I expect the models I built 
> to be more than able predictors, I want them to capture some aspect of 
> nature, to teach me something about nature, so I refrain from model 
> averaging, which is an open admission that you don't care too much 
> about what's really going on.
> 
> -- The belief that certain data analysis practices -- standard or not 
> -- somehow can be trusted to yield reliable scientific results in the 
> face of clear theoretical (mathematical )and practical results to the 
> contrary, while widespread, impedes and often thwarts the progress of 
> science, Evidence-based medicine and clinical trials came about for a 
> reason. I would encourage you to reexamine the basis of your 
> scientific practice and the role that "magical thinking" plays in it.
> 
> Best,
> Bert
> 
> --
> Ruben: All right Bert. I often re-examine the basis of my scientific 
> praxis but less often than I did before, I have to confess. I like to 
> think it is because I am converging on the right praxis so there are 
> less issues to re-examine. But this problem of model selection is a tough one.
> Being a likelihoodist in inference naturally leads you to AIC-based 
> model selection, Jim Lindsey being influent too. Wanting that your 
> models say some something about nature is another strong conditioning 
> factor. Put this two together and it becomes hard to do 
> model-averaging. And it has nothing to do with religion (yuck!).
> 
> ______________________________________________
> R-help@ mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 

-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/How-do-I-compare-47-GLM-models-with-1-to-5-interactions-and-unique-combinations-tp4326407p4333464.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.