[R] can I do this with R?

Thu May 29 01:30:03 CEST 2008

Andrew Robinson 写道:
> On Wed, May 28, 2008 at 03:47:49PM -0700, Xiaohui Chen wrote:
>   
>> Frank E Harrell Jr ??????:
>>     
>>> Xiaohui Chen wrote:
>>>       
>>>> step or stepAIC functions do the job. You can opt to use BIC by 
>>>> changing the mulplication of penalty.
>>>>
>>>> I think AIC and BIC are not only limited to compare two pre-defined 
>>>> models, they can be used as model search criteria. You could 
>>>> enumerate the information criteria for all possible models if the 
>>>> size of full model is relatively small. But this is not generally 
>>>> scaled to practical high-dimensional applications. Hence, it is often 
>>>> only possible to find a 'best' model of a local optimum, e.g. 
>>>> measured by AIC/BIC.
>>>>         
>>> Sure you can use them that way, and they may perform better than other 
>>> measures, but the resulting model will be highly biased (regression 
>>> coefficients biased away from zero).  AIC and BIC were not designed to 
>>> be used in this fashion originally.  Optimizing AIC or BIC will not 
>>> produce well-calibrated models as does penalizing a large model.
>>>
>>>       
>> Sure, I agree with this point. AIC is used to correct the bias from the 
>> estimations which minimize the KL distance of true model, provided the 
>> assumed model family contains the true model. BIC is designed for 
>> approximating the model marginal likelihood. Those are all 
>> post-selection estimating methods. For simutaneous variable selection 
>> and estimation, there are better penalizations like L1 penalty, which is 
>> much better than AIC/BIC in terms of consistency.
>>     
>
> Xiaohui, 
>
> Tibshirani (1996) suggests that the quality of the L1 penalty depends
> on the structure of the dataset.  As I recall, subset selection was
> preferred for finding a small number of large effects, lasso (L1) for
> finding a small to moderate number of moderate-sized effects, and
> ridge (L2) for many small effects.
>   
I agree with you. Higher correlation between covariates makes the LASSO 
harder to choose the correct model asymptotically, see Zhao and Yu 
(2006). Subset selection based on prediction error tends to inflate the 
estimated variance of coefficients in linear models. L2 doesn't do the 
variable selection job as well known. But (convex) mixing L1 and L2 
penalty is the elastic net proposed by Zou and Hastie (2006), which 
encourages the grouped effect. More recently, there are many other 
priors/penalties proposed if you go through the literature.

Zhao P. & Yu B. (2006) On Model Selection Consistency of Lasso. JMLR
Zou H. and Hastie T. (2006) Regularization and variable selection via 
the elastic net. JRSSB
> Can you provide any references to more up-to-date simulations that you
> would recommend?
>
> Cheers,
>
> Andrew
>