[R] testing non-linear component in mgcv:gam

Wed Oct 5 23:01:58 CEST 2005

Dear Denis,

You got me: I would have thought from ?summary.gam that this would be the
same as the adjusted R^2 for a linear model. Note, however, that the
percentage of deviance explained checks with the R^2 from the linear model,
as expected.

Maybe you should address this question to the package author.

Regards,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: Denis Chabot [mailto:chabotd at globetrotter.net] 
> Sent: Wednesday, October 05, 2005 3:33 PM
> To: John Fox
> Cc: R list
> Subject: Re: [R] testing non-linear component in mgcv:gam
> 
> Thank you everyone for your help, but my introduction to GAM 
> is turning my brain to mush. I thought the one part of the 
> output I understood the best was r-sq (adj), but now even 
> this is becoming foggy.
> 
> In my original message I mentioned a gam fit that turns out 
> to be a linear fit. By curiosity I analysed it with a linear 
> predictor only with mgcv package, and then as a linear model. 
> The output was identical in both, but the r-sq (adj) was 0.55 
> in mgcv and 0.26 in lm. In lm I hope that my interpretation 
> that 26% of the variance in y is explained by the linear 
> relationship with x is valid. Then what does r2 mean in mgcv?
> 
> Denis
>  > summary.gam(lin)
> 
> Family: gaussian
> Link function: identity
> 
> Formula:
> wm.sed ~ Temp
> 
> Parametric coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> Temp        -0.023792   0.006369  -3.736 0.000666 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> 
> R-sq.(adj) =  0.554   Deviance explained = 28.5%
> GCV score = 0.09904   Scale est. = 0.093686  n = 37
> 
> 
>  > summary(sed.true.lin)
> 
> Call:
> lm(formula = wm.sed ~ Temp, weights = N.sets)
> 
> Residuals:
>      Min      1Q  Median      3Q     Max
> -0.6138 -0.1312 -0.0325  0.1089  1.1449
> 
> Coefficients:
>               Estimate Std. Error t value Pr(>|t|)
> (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> Temp        -0.023792   0.006369  -3.736 0.000666 ***
> ---
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> 
> Residual standard error: 0.3061 on 35 degrees of freedom
> Multiple R-Squared: 0.285,    Adjusted R-squared: 0.2646
> F-statistic: 13.95 on 1 and 35 DF,  p-value: 0.000666
> 
> 
> Le 05-10-05 à 09:45, John Fox a écrit :
> 
> > Dear Denis,
> >
> > Take a closer look at the anova table: The models provide identical 
> > fits to the data. The differences in degrees of freedom and 
> deviance 
> > between the two models are essentially zero, 5.5554e-10 and 
> 2.353e-11 
> > respectively.
> >
> > I hope this helps,
> >  John
> >
> > --------------------------------
> > John Fox
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> > --------------------------------
> >
> >
> >> -----Original Message-----
> >> From: r-help-bounces at stat.math.ethz.ch 
> >> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Denis Chabot
> >> Sent: Wednesday, October 05, 2005 8:22 AM
> >> To: r-help at stat.math.ethz.ch
> >> Subject: [R] testing non-linear component in mgcv:gam
> >>
> >> Hi,
> >>
> >> I need further help with my GAMs. Most models I test are very 
> >> obviously non-linear. Yet, to be on the safe side, I report the 
> >> significance of the smooth (default output of mgcv's
> >> summary.gam) and confirm it deviates significantly from linearity.
> >>
> >> I do the latter by fitting a second model where the same 
> predictor is 
> >> entered without the s(), and then use anova.gam to compare 
> the two. I 
> >> thought this was the equivalent of the default output of anova.gam 
> >> using package gam instead of mgcv.
> >>
> >> I wonder if this procedure is correct because one of my models 
> >> appears to be linear. In fact mgcv estimates df to be 
> exactly 1.0 so 
> >> I could have stopped there. However I inadvertently repeated the 
> >> procedure outlined above. I would have thought in this case the 
> >> anova.gam comparing the smooth and the linear fit would 
> for sure have 
> >> been not significant.
> >> To my surprise, P was 6.18e-09!
> >>
> >> Am I doing something wrong when I attempt to confirm the non- 
> >> parametric part a smoother is significant? Here is my example case 
> >> where the relationship does appear to be linear:
> >>
> >> library(mgcv)
> >>
> >>> This is mgcv 1.3-7
> >>>
> >> Temp <- c(-1.38, -1.12, -0.88, -0.62, -0.38, -0.12, 0.12, 
> 0.38, 0.62, 
> >> 0.88, 1.12,
> >>             1.38, 1.62, 1.88, 2.12, 2.38, 2.62, 2.88, 3.12, 3.38, 
> >> 3.62, 3.88,
> >>             4.12, 4.38, 4.62, 4.88, 5.12, 5.38, 5.62, 5.88, 6.12, 
> >> 6.38, 6.62, 6.88,
> >>             7.12, 8.38, 13.62)
> >> N.sets <- c(2, 6, 3, 9, 26, 15, 34, 21, 30, 18, 28, 27, 
> 27, 29, 31, 
> >> 22, 26, 24, 23,
> >>              15, 25, 24, 27, 19, 26, 24, 22, 13, 10, 2, 5, 
> 3, 1, 1, 
> >> 1, 1, 1) wm.sed <- c(0.000000000, 0.016129032, 0.000000000, 
> >> 0.062046512, 0.396459596, 0.189082949,
> >>              0.054757925, 0.142810440, 0.168005168, 0.180804428, 
> >> 0.111439628, 0.128799505,
> >>              0.193707937, 0.105921610, 0.103497845, 0.028591837, 
> >> 0.217894389, 0.020535469,
> >>              0.080389068, 0.105234450, 0.070213450, 0.050771363, 
> >> 0.042074434, 0.102348837,
> >>              0.049748344, 0.019100478, 0.005203125, 0.101711864, 
> >> 0.000000000, 0.000000000,
> >>              0.014808824, 0.000000000, 0.222000000, 0.167000000, 
> >> 0.000000000, 0.000000000,
> >>              0.000000000)
> >>
> >> sed.gam <- gam(wm.sed~s(Temp),weight=N.sets)
> >> summary.gam(sed.gam)
> >>
> >>> Family: gaussian
> >>> Link function: identity
> >>>
> >>> Formula:
> >>> wm.sed ~ s(Temp)
> >>>
> >>> Parametric coefficients:
> >>>             Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept)  0.08403    0.01347   6.241 3.73e-07 ***
> >>> ---
> >>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>> Approximate significance of smooth terms:
> >>>         edf Est.rank     F  p-value
> >>> s(Temp)   1        1 13.95 0.000666 ***
> >>> ---
> >>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>> R-sq.(adj) =  0.554   Deviance explained = 28.5%
> >>> GCV score = 0.09904   Scale est. = 0.093686  n = 37
> >>>
> >>
> >> # testing non-linear contribution
> >> sed.lin <- gam(wm.sed~Temp,weight=N.sets)
> >> summary.gam(sed.lin)
> >>
> >>> Family: gaussian
> >>> Link function: identity
> >>>
> >>> Formula:
> >>> wm.sed ~ Temp
> >>>
> >>> Parametric coefficients:
> >>>              Estimate Std. Error t value Pr(>|t|)
> >>> (Intercept)  0.162879   0.019847   8.207 1.14e-09 ***
> >>> Temp        -0.023792   0.006369  -3.736 0.000666 ***
> >>> ---
> >>> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> >>>
> >>>
> >>> R-sq.(adj) =  0.554   Deviance explained = 28.5%
> >>> GCV score = 0.09904   Scale est. = 0.093686  n = 37
> >>>
> >> anova.gam(sed.lin, sed.gam, test="F")
> >>
> >>> Analysis of Deviance Table
> >>>
> >>> Model 1: wm.sed ~ Temp
> >>> Model 2: wm.sed ~ s(Temp)
> >>>    Resid. Df Resid. Dev         Df  Deviance      F   Pr(>F)
> >>> 1 3.5000e+01      3.279
> >>> 2 3.5000e+01      3.279 5.5554e-10 2.353e-11 0.4521 6.18e-09 ***
> >>>
> >>
> >>
> >> Thanks in advance,
> >>
> >>
> >> Denis Chabot
> >>
> >> ______________________________________________
> >> R-help at stat.math.ethz.ch mailing list 
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide!
> >> http://www.R-project.org/posting-guide.html
> >>
> >
> >
>