[R] can I do this with R?

Andrew Robinson A.Robinson at ms.unimelb.edu.au
Thu May 29 01:08:55 CEST 2008

On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:
> Frank E Harrell Jr ??????:
> >Xiaohui Chen wrote:
> >>step or stepAIC functions do the job. You can opt to use BIC by 
> >>changing the mulplication of penalty.
> >>
> >>I think AIC and BIC are not only limited to compare two pre-defined 
> >>models, they can be used as model search criteria. You could 
> >>enumerate the information criteria for all possible models if the 
> >>size of full model is relatively small. But this is not generally 
> >>scaled to practical high-dimensional applications. Hence, it is often 
> >>only possible to find a 'best' model of a local optimum, e.g. 
> >>measured by AIC/BIC.
> >
> >Sure you can use them that way, and they may perform better than other 
> >measures, but the resulting model will be highly biased (regression 
> >coefficients biased away from zero).  AIC and BIC were not designed to 
> >be used in this fashion originally.  Optimizing AIC or BIC will not 
> >produce well-calibrated models as does penalizing a large model.
> >
> Sure, I agree with this point. AIC is used to correct the bias from the 
> estimations which minimize the KL distance of true model, provided the 
> assumed model family contains the true model. BIC is designed for 
> approximating the model marginal likelihood. Those are all 
> post-selection estimating methods. For simutaneous variable selection 
> and estimation, there are better penalizations like L1 penalty, which is 
> much better than AIC/BIC in terms of consistency.


Tibshirani (1996) suggests that the quality of the L1 penalty depends
on the structure of the dataset.  As I recall, subset selection was
preferred for finding a small number of large effects, lasso (L1) for
finding a small to moderate number of moderate-sized effects, and
ridge (L2) for many small effects.

Can you provide any references to more up-to-date simulations that you
would recommend?


Andrew Robinson  
Department of Mathematics and Statistics            Tel: +61-3-8344-6410
University of Melbourne, VIC 3010 Australia         Fax: +61-3-8344-4599

More information about the R-help mailing list