[R] Variable selection based on both training and testing data

Jin Minming jminming at yahoo.com
Mon Jan 30 19:20:37 CET 2012


I do not have enough test data for regression analysis although I know there are some statistical regression methods that can be used for small dataset. That is why I need build a model firslty using training dataset.

Thanks,

Jim
 

--- On Mon, 30/1/12, Liaw, Andy <andy_liaw at merck.com> wrote:

> From: Liaw, Andy <andy_liaw at merck.com>
> Subject: RE: [R] Variable selection based on both training and testing data
> To: "'Jin Minming'" <jminming at yahoo.com>, "r-help at r-project.org" <r-help at r-project.org>
> Date: Monday, 30 January, 2012, 13:39
> Variable section is part of the
> training process-- it chooses the model.  By
> definition, test data is used only for testing (evaluating
> chosen model).
> 
> If you find a package or function that does variable
> selection on test data, run from it!
> 
> Best,
> Andy 
> 
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> 
> > [mailto:r-help-bounces at r-project.org]
> On Behalf Of Jin Minming
> > Sent: Monday, January 30, 2012 8:14 AM
> > To: r-help at r-project.org
> > Subject: [R] Variable selection based on both training
> and 
> > testing data
> > 
> > Dear all,
> > 
> > The variable selection in regression is usually
> determined by 
> > the training data using AIC or F value, such as
> stepAIC. Is 
> > there some R package that can consider both the
> training and 
> > test dataset? For example, I have two separate training
> data 
> > and test data. Firstly, a regression model is obtained
> by 
> > using training data, and then this model is tested by
> using 
> > test data. This process continues in order to find some
> 
> > possible optimal models in terms of RMSE or R2 for both
> 
> > training and test data. 
> > 
> > Thanks,
> > 
> > Jim
> > 
> > ______________________________________________
> > R-help at r-project.org
> mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained,
> reproducible code.
> > 
> Notice:  This e-mail message, together with any
> attachments, contains
> information of Merck & Co., Inc. (One Merck Drive,
> Whitehouse Station,
> New Jersey, USA 08889), and/or its affiliates Direct contact
> information
> for affiliates is available at 
> http://www.merck.com/contact/contacts.html) that may be
> confidential,
> proprietary copyrighted and/or legally privileged. It is
> intended solely
> for the use of the individual or entity named on this
> message. If you are
> not the intended recipient, and have received this message
> in error,
> please notify us immediately by reply e-mail and then delete
> it from 
> your system.
> 
>



More information about the R-help mailing list