[R] Variable selection based on both training and testing data

Liaw, Andy andy_liaw at merck.com
Mon Jan 30 14:39:05 CET 2012


Variable section is part of the training process-- it chooses the model.  By definition, test data is used only for testing (evaluating chosen model).

If you find a package or function that does variable selection on test data, run from it!

Best,
Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Jin Minming
> Sent: Monday, January 30, 2012 8:14 AM
> To: r-help at r-project.org
> Subject: [R] Variable selection based on both training and 
> testing data
> 
> Dear all,
> 
> The variable selection in regression is usually determined by 
> the training data using AIC or F value, such as stepAIC. Is 
> there some R package that can consider both the training and 
> test dataset? For example, I have two separate training data 
> and test data. Firstly, a regression model is obtained by 
> using training data, and then this model is tested by using 
> test data. This process continues in order to find some 
> possible optimal models in terms of RMSE or R2 for both 
> training and test data. 
> 
> Thanks,
> 
> Jim
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list