[R] How to avoid overfitting in gam(mgcv)

Ariyo Kanno 10dimensioner at gmail.com
Wed Oct 3 15:15:44 CEST 2007


Sorry, let me fix 1 sentence.

"Here I try to mean by "overfitting" that GCV was significantly SMALLER
than the mean square error of prediction of the validation data, which
was randomly selected and not used for regression."

> Thank you for valuable advices.
> I'm sorry Dr. N. Wood that by mistake I sent this reply firstly to
> your personal e-mail address.
>
> I will use the "min.sp" argument when the data size is very small. I'd
> like to know if there is any criteria for selecting "min.sp."
>
> I compared gamma=1.0 and 1.4, and I could see the smoothing effects of
>  enhancing gamma by comparing edf and smoothing parameter. But it was
> not enough to suppress the overfitting when data size was small.
>
> Here I try to mean by "overfitting" that GCV was significantly larger
> than the mean square error of prediction of the validation data, which
> was randomly selected and not used for regression.
>
> Best Wishes,
> Ariyo
>
> 2007/10/3, Simon Wood <s.wood at bath.ac.uk>:
> > On Wednesday 03 October 2007 10:49, Ariyo Kanno wrote:
> > > I appreciate your quick reply.
> > > I am using the model of the following structure :
> > >
> > > fit <- gam(y~x1+s(x2))
> > >
> > > ,where y, x1, and x2 are quantitative variables.
> > > So the response distribution is assumed to be gaussian(default).
> > >
> > > Now I understand that the data size was too small.
> > -- Well, the 10 end is definitely too small, but you can get quite reasonable
> > estimates of a single smoothing parameter from 30+ gaussian data.
> > -- You can force smoother models my either setting the smoothing parameter
> > yourself using the `sp' argument to `gam', or by using the `min.sp' argument
> > to set a lower bound on the smoothing parameter.
> > -- I'm suprised that `gamma' had no effect - how high did you try?
> >
> > best,
> > Simon
> >
> >
> >
> > > Thank you.
> > >
> > > Best Wishes,
> > >
> > > Ariyo
> > >
> > > 2007/10/3, Simon Wood <s.wood at bath.ac.uk>:
> > > > What sort of model structure are you using? In particular what is the
> > > > response distribution? For poisson and binomial then overfitting can be a
> > > > sign of overdispersion and quasipoisson or quasibinomial may be better.
> > > > Also I would not expect to get useful smoothing parameter estimates from
> > > > 10 data!
> > > >
> > > > best,
> > > > Simon
> > > >
> > > > On Wednesday 03 October 2007 06:55, $B?@LnM- at 8(B wrote:
> > > > > Dear listers,
> > > > >
> > > > > I'm using gam(from mgcv) for semi-parametric regression on small and
> > > > > noisy datasets(10 to 200
> > > > > observations), and facing a problem of overfitting.
> > > > >
> > > > > According to the book(Simon N. Wood / Generalized Additive Models: An
> > > > > Introduction with R), it is
> > > > > suggested to avoid overfitting by inflating the effective degrees of
> > > > > freedom in GCV evaluation with
> > > > > increased "gamma" value(e.g. 1.4). But in my case, it didn't make a
> > > > > significant change in the
> > > > > results.
> > > > >
> > > > > The only way I've found to suppress overfitting is to set the basis
> > > > > dimension "k" at very low values
> > > > > (3 to 5). However, I don't think this is reasonable because knots
> > > > > selection will then be an
> > > > > important issue.
> > > > >
> > > > > Is there any other means to avoid overfitting when alalyzing small
> > > > > datasets?
> > > > >
> > > > > Thank you for your help in advance,
> > > > > Ariyo Kanno
> > > > >
> > > > > --
> > > > > Ariyo Kanno
> > > > > 1st-year doctor's degree student at
> > > > > Institute of Environmental Studies,
> > > > > The University of Tokyo
> > > > >
> > > > > ______________________________________________
> > > > > R-help at r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > PLEASE do read the posting guide
> > > > > http://www.R-project.org/posting-guide.html and provide commented,
> > > > > minimal, self-contained, reproducible code.
> > > >
> > > > --
> > > >
> > > > > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > > > > +44 1225 386603  www.maths.bath.ac.uk/~sw283
> > > >
> > > > ______________________________________________
> > > > R-help at r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > > > http://www.R-project.org/posting-guide.html and provide commented,
> > > > minimal, self-contained, reproducible code.
> >
> > --
> > > Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> > > +44 1225 386603  www.maths.bath.ac.uk/~sw283
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>



More information about the R-help mailing list