[R] Possible overfitting of a GAM

Simon Wood s.wood at bath.ac.uk
Mon Feb 18 09:55:39 CET 2008


The figures don't obviously scream out `overfitting' to me, and the standard 
errors don't look excessively wide, given the data. Unless there is a strong 
reason for using `lo', you could also try the `gam' function in package 
`mgcv': it attempts to estimate the appropriate degree of smoothing 
automatically. If you get similar curves using mgcv::gam then you have some 
re-assurance that you don't have overfit here. 

On Saturday 16 February 2008 22:25, Thomas L Jones, PhD wrote:
> The subject is a Generalized Additive Model. Experts caution us against
> overfitting the data, which can cause inaccurate results. I am not a
> statistician (my background is in Computer Science). Perhaps some kind soul
> would take a look and vet the model for overfitting the data.
>
> The study estimated the ebb and flow of traffic through a voting place.
> Just one voting place was studied; the election was the U.S. mid-term
> election about a year ago. Procedure: The voting day was divided into
> five-minute bins, and the number of voters arriving in each bin was
> recorded. The voting day was 13 hours long, giving 156 bins.
>
> See http://tinyurl.com/36vzop for the scatterplot. There is a rather high
> random variation, due in part to the fact that the bin width was
> intentionally set to be narrow, in order to improve the amount of timing
> information gathered.
>
> http://tinyurl.com/3xjsyo displays the fitted curve. A GAM was used, with
> the loess smoothing algorithm (locally weighted regression). The default
> span was used. http://tinyurl.com/34av6l gives the scatterplot and the
> fitted curve. The two seem to match reasonably well.
>
> However, when I tried to generate the standard errors, things went awry.
> (Please see http://tinyurl.com/38ej2t ) There are three curves, seemingly
> the fitted curve and the curves for plus and minus two standard errors. The
> shapes seem okay, but there are large errors in the y values.
>
> Question: Have I overfitted the data?
>
> Feedback?
>
> Tom
> Thomas L. Jones, PhD, Computer Science
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283



More information about the R-help mailing list