[R] How to estimate whether overfitting?

Bert Gunter gunter.berton at gene.com
Mon May 10 18:05:16 CEST 2010


(Near) non-identifiability (especially in nonlinear models, which include
linear mixed effects models, Bayesian hierarchical models, etc.) is
typically a strong clue; usually indicated by software complaints (e.g.
convergence failures, running up against iteration limits, etc.). 

However this is sufficient-ish, not necessary: "over-fitting" frequently
occurs even without such overt complaints. It should also be said that,
except for identifiability,  "over-fitting" is not a well-defined
statistical term: it depends on the scientific context.


Bert Gunter
Genentech Nonclinical Biostatistics
 
 -----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On
Behalf Of Steve Lianoglou
Sent: Sunday, May 09, 2010 6:13 PM
To: David Winsemius
Cc: r-help at r-project.org; bbslover
Subject: Re: [R] How to estimate whether overfitting?

On Sun, May 9, 2010 at 11:53 AM, David Winsemius <dwinsemius at comcast.net>
wrote:
>
> On May 9, 2010, at 9:20 AM, bbslover wrote:
>
>>
>> 1. is there some criterion to estimate overfitting?  e.g. R2 and Q2 in
the
>> training set, as well as R2 in the test set, when means overfitting.  
for
>> example,  in my data, I have R2=0.94 for the training set and  for the
>> test
>> set R2=0.70, is overfitting?
>> 2. in this scatter, can one say this overfitting?
>>
>> 3. my result is obtained by svm, and the sample are 156 and 52 for the
>> training and test sets, and predictors are 96,   In this case, can svm be
>> employed to perform prediction?   whether the number of the predictors
are
>> too many ?
>>
>
> I think you need to buy a copy of Hastie, Tibshirani, and Friedman and do
> some self-study of chapters 7 and 12.

And you don't even have to buy it before you can start studying since
the PDF is available here:
http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Having a hard cover is always handy, tho ..
-steve

-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list