[R] MGCV: Use of irls.reg option

r-help.20.trevva at spamgourmet.com r-help.20.trevva at spamgourmet.com
Fri Jun 22 10:13:06 CEST 2012


Hi Simon,

Thanks for taking the time to reply. Please let me explain a few more details.

The problem that I am working on is essentially the same as the
Bristol Channel Sole Egg distribution example in your book and in the
"soap" paper but instead it is Herring Larvae in the English Channel -
same same, but different. The model structure is:

mdl <- gam(nlarv ~ s(lon,lat)  + s(day of year) +
factor(year),data=dat, family=poisson(log="link"))

ie a multiplicative structure with a poisson observation model. Now,
the problem is that at the edges of the distribution in space (lon,
lat) (and to a lesser extent time) the observations are rich in zeros,
as we move away from the main spawning grounds. The model appears to
be converging ok, but the residuals look horrible. In particular,
there are some extremely large residuals around the edges (pearson
residuals of 1000 or so), where we get a few larvae in a region where
they are otherwise unlikely. When I look at the TPRS (on the linear
predictor scale) it appears to be heading towards minus infinity -
essentially we end up in a situation where we observe a single larvae,
but the expected mean number is 1e-10, which creates these very large
residuals. This was where I happened across the irls.reg argument -
the description in the help file (i.e. lack of identifiability) sounds
very much like the problem that I am having, which is what inspired
the question.

I've also tried using the "soap" smoother in place of the TPRS - the
problem is not as severe, and I can limit it by making the boundaries
of the soap film extremely tight around the non-zero data but the same
underlying problem is still lurking in the corners...

Do you have any suggestions as to how I can get around this
edge-effects problem?

Mark
----

Hi Mark,

irls.reg is kind of `legacy code'. Does model fitting actually fail for
your example, or is it just that the
estimated spatial smooth looks unpleasant?

best,
Simon


On 06/21/2012 01:28 AM, r-help.20.trevva at spamgourmet.com wrote:
> Hi,
>
> In the help files in the  mgcv package for the gam.control() function,
> there is an option irls.reg. The help files describe this option as:
>
> For most models this should be 0. The iteratively re-weighted least squares
> method by which GAMs are fitted can fail to converge in some circumstances.
> For example, data with many zeroes can cause problems in a model with a log
> link, because a mean of zero corresponds to an infinite range of
> linear predictor
> values. Such convergence problems are caused by a fundamental lack of
> identifiability, but do not show up as lack of identifiability in the
> penalized linear
> model problems that have to be solved at each stage of iteration. In such
> circumstances it is possible to apply a ridge regression penalty to the model to
> impose identifiability, and irls.reg is the size of the penalty.
>
> I am trying to fit a poisson GLM model with a log-link function and am
> having problems similar to those described - in particular, the model
> has a spatial s(lon,lat) term and there are lot of zeros around the
> edges of my domain which are making the TPRS do strange thing. It
> sounds like irls.reg might be the answer to my problems. The question
> I have is how to use it? What is an appropriate value? I can't seem to
> find any more information than that provided, and I don't know if I
> really understand what it is doing. Are there any examples or
> references on this that I have overlooked during my googling that
> could help?
>
> Best wishes,
>
> Mark Payne
> DTU Aqua,
> Copenhagen, Denmark
>



More information about the R-help mailing list