[R] GAM with the negative binomial distribution: why do predictions no match with original values?

Tue Nov 22 23:33:13 CET 2016

> On Nov 22, 2016, at 1:29 PM, Marine Regis <marine.regis at hotmail.fr> wrote:
> 
> Hello,
> 
>> From capture data, I would like to assess the effect of longitudinal changes in proportion of forests on abundance of skunks. To test this, I built this GAM where the dependent variable is the number of unique skunks and the independent variables are the X coordinates of the centroids of trapping sites (called "X" in the GAM) and the proportion of forests within the trapping sites (called "prop_forest" in the GAM):
> 
>    mod <- gam(nb_unique ~ s(x,prop_forest), offset=log_trap_eff, family=nb(theta=NULL, link="log"), data=succ_capt_skunk, method = "REML", select = TRUE)
>    summary(mod)
> 
>    Family: Negative Binomial(13.446)
>    Link function: log
> 
>    Formula:
>    nb_unique ~ s(x, prop_forest)
> 
>    Parametric coefficients:
>                Estimate Std. Error z value Pr(>|z|)
>    (Intercept) -2.02095    0.03896  -51.87   <2e-16 ***
>    ---
>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
>    Approximate significance of smooth terms:
>                       edf Ref.df Chi.sq  p-value
>    s(x,prop_forest) 3.182     29  17.76 0.000102 ***
>    ---
>    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> 
>    R-sq.(adj) =   0.37   Deviance explained =   49%
>    -REML = 268.61  Scale est. = 1         n = 58
> 
> 
> I built a GAM  for the negative binomial family. When I use the function `predict.gam`, the predictions of capture success from the GAM and the values of capture success from original data are very different. What is the reason for differences occur?

You have an offset that is not described. And `gam` suppresses the Intercept. These would seem to be likely sources of confusion. For the best answers either on Rhelp or on CrossValidated.com you should be offering a working example. It's not our responsibility to build these for you.

I found that others had included offsets and then had questions about prediction. I haven't reviewed these candidates but perhaps you can find one in this modest listing that comes up from the MarkMail search engine:

http://markmail.org/search/?q=list%3Aorg.r-project.r-help+mgcv+gam+offset+predict

library(mgcv) 
x<-seq(0,10,length=100) 
y<-x^2+rnorm(100) 
m1<-gam(y~s(x,k=10,bs='cs')) 
m2<-gam(y~s(x,k=10,bs='cs'), offset= rep(10,100) ) 
x1<-seq(0,10,0.1) 
y1<-predict(m1,newdata=list(x=x1)) 
y2<-predict(m2,newdata=list(x=x1))

plot(x,y,ylim=c(0,100)) 
lines(x1,y1,lwd=4,col='red') 
lines(x1,y2,lwd=4,col='blue')

-- 
David.

> 
> **With GAM:**
> 
>    modPred <- predict.gam(mod, se.fit=TRUE,type="response")
>    summary(modPred$fit)
>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>     0.1026  0.1187  0.1333  0.1338  0.1419  0.1795
> 
> **With original data:**
> 
>    summary(succ_capt_skunk$nb_unique)
>       Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
>      17.00   59.00   82.00   81.83  106.80  147.00
> 
> The question has already been posted on Cross validated (http://stats.stackexchange.com/questions/247347/gam-with-the-negative-binomial-distribution-why-do-predictions-no-match-with-or) without success.
> 
> Thanks a lot for your time.
> Have a nice day
> Marine
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA