[R] Relative GCV - poisson and negbin GAMs (mgcv)

Simon Wood s.wood at bath.ac.uk
Fri Apr 13 22:12:59 CEST 2007


On Sunday 08 April 2007 05:57, Jordan T Watson wrote:
Sorry for the tardy reply - got filtered to the wrong place...

> I am using gam in mgcv (1.3-22) and trying to use gcv to help with model
> selection.  However, I'm a little confused by the process of assessing GCV
> scores based on their magnitude (or on relative changes in magnitude).
>
> Differences in GCV scores often seem "obvious" with my poisson gams but
> with negative binomial, the decision seems less clear.
- This is not well  documented - sorry. The negative binomial is a special 
case for mgcv::gam (see ?gam.neg.bin). To handle the extra parameter of the 
negative binomial, smoothing parameter estimation *has* to be performed via 
`performance iteration' (see ?gam.method), which is not the default for any 
other family. Within the IRLS loop of the performance iteration an extra step 
is inserted which estimates the negative binomial parameter at each 
iteration, in order to try and achieve a scale parameter estimate of 1 
(see ?gam.neg.bin). However, trying to force the estimated scale parameter to 
1 will also force the GCV score used in performance iteration to be close to 
1 (at least for large samples), whatever model is fitted.... so comparison of 
GCV scores between different neg.bin models is unlikely to be very 
informative. In addition small changes in GCV score are not meaningful when 
comparing different models fitted by performance iteration, for any family 
(again see details section of ?gam.method). I think that AIC() would provide 
a better basis for comparing alternative neg.bin GAMs: it's even more 
approximate than usual when used in this context, since the neg.bin parameter 
estimate is definitely not an MLE, but it will still give a much more 
meaningful guide than comparison of GCV scores, in this case. 

For other families none of these problems occur provided the default direct 
(`outer') smoothness selection method is used.

> My data represent a similar pattern as below - where I see (seemingly)
> drastic changes in GCV for the poisson with different model structures, but
> the negative binomial often seems only to change in the second or third
> decimal place for the same models.
>
> Is there a standard for how many decimals one should look at GCV, or am I
> totally missing something (I'm quite new to this, as I'm sure is obvious). 
--- no there isn't. When using all other families (and the default `outer' 
fitting method) then simply choosing the lowest score is reasonable. For the 
negative binomial, I think the answer is not to use the GCV score. 

best,
Simon


>
> Thanks in advance!
> Jordan
>
>
> library(mgcv)
> set.seed(0)
> n<-400
> sig<-2
> x0 <- runif(n, 0, 1)
> x1 <- runif(n, 0, 1)
> x2 <- runif(n, 0, 1)
> x3 <- runif(n, 0, 1)
> f0 <- function(x) 2 * sin(pi * x)
> f1 <- function(x) exp(2 * x)
> f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
> f <- f0(x0) + f1(x1) + f2(x2)
> g<-exp(f/4)
> y<-rpois(rep(1,n),g)
> summary(gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,scale=-1))$gcv
> summary(gam(y~x0+x1+x2+s(x3),family=poisson,scale=-1))$gcv
> summary(gam(y~x0+x1+x2+x3,family=poisson,scale=-1))$gcv
> summary(gam(y~s(x3),family=poisson,scale=-1))$gcv
>
> summary(gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negative.binomial(1)))$gcv
> summary(gam(y~x0+x1+x2+s(x3),family=negative.binomial(1)))$gcv
> summary(gam(y~x0+x1+x2+x3,family=negative.binomial(1)))$gcv
> summary(gam(y~s(x3),family=negative.binomial(1)))$gcv
>
> Outputs from above:
> > summary(gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=poisson,scale=-1))$gcv
>
> [1] 1.218529
>
> > summary(gam(y~x0+x1+x2+s(x3),family=poisson,scale=-1))$gcv
>
> [1] 5.058014
>
> > summary(gam(y~x0+x1+x2+x3,family=poisson,scale=-1))$gcv
>
> [1] 4.954163
>
> > summary(gam(y~s(x3),family=poisson,scale=-1))$gcv
>
> [1] 8.286693
>
> > summary(gam(y~s(x0)+s(x1)+s(x2)+s(x3),family=negative.binomial(1)))$gcv
>
> [1] 1.047581
>
> > summary(gam(y~x0+x1+x2+s(x3),family=negative.binomial(1)))$gcv
>
> [1] 1.013617
>
> > summary(gam(y~x0+x1+x2+x3,family=negative.binomial(1)))$gcv
>
> [1] 1.012658
>
> > summary(gam(y~s(x3),family=negative.binomial(1)))$gcv
>
> [1] 1.005986
>
>
> #######################################
> Jordan Watson
> School of Aquatic and Fishery Sciences
> University of Washington
> Seattle, WA
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html and provide commented, minimal,
> self-contained, reproducible code.

-- 
> Simon Wood, Mathematical Sciences, University of Bath, Bath, BA2 7AY UK
> +44 1225 386603  www.maths.bath.ac.uk/~sw283



More information about the R-help mailing list