[R] p-level in packages mgcv and gam

Thu Sep 29 21:55:35 CEST 2005

OK, I think I understand better but still have two points to clarify.

The first one is about the number of df. I think those who replied on  
this objected to the way I chose df, not the fact that I would run a  
model with 7.4 df per se. If I read you correctly, I artificially  
reduce my p-value by using the estimated df found in a mgcv gam model  
into another model where I fix df. This is fine, I am quite willing  
not to run a second model with a fixed df and instead tell my readers  
that my model is "marginally significant" with a p-value of 0.03.

This being said, do you know of guidelines for choosing df? A  
colleague told me he does not go above 10% of the number of points.  
Should I be concerned when mgcv estimates 7.4 df for 34 points? Note  
that for this particular model P < 1e-16, and P is also very small if  
I fix df to either 4 or 7.

My second point is the difference between models fitted by packages  
gam and mgcv. Sure, some of you have said "different algorithms". And  
when I specify dfs, shouldn't P-values be very similar for the 2  
packages? If not, what does it say of the confidence we can have in  
the models?

I draw your attention to this exampl: I obtained P-values of 0.50 and  
0.03 with packages gam and mgcv respectively:

 > library(gam)
Loading required package: splines
 > data(kyphosis)
 > kyp1 <- gam(Kyphosis ~ s(Number, 3), family=binomial, data=kyphosis)
 > summary.gam(kyp1)

Call: gam(formula = Kyphosis ~ s(Number, 3), family = binomial, data  
= kyphosis)
Deviance Residuals:
     Min      1Q  Median      3Q     Max
-1.3646 -0.6233 -0.4853 -0.3133  2.0965

(Dispersion Parameter for binomial family taken to be 1)

     Null Deviance: 83.2345 on 80 degrees of freedom
Residual Deviance: 71.9973 on 76.9999 degrees of freedom
AIC: 79.9976

Number of Local Scoring Iterations: 7

DF for Terms and Chi-squares for Nonparametric Effects

              Df Npar Df Npar Chisq  P(Chi)
(Intercept)   1
s(Number, 3)  1       2    1.37149 0.50375

 > detach(package:gam)
 > library(mgcv)
This is mgcv 1.3-7
 > kyp2 <- gam(Kyphosis ~ s(Number, k=4, fx=T),  family=binomial,  
data=kyphosis)
 > summary.gam(kyp2)

Family: binomial
Link function: logit

Formula:
Kyphosis ~ s(Number, k = 4, fx = T)

Parametric coefficients:
             Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.5504     0.3342   -4.64 3.49e-06 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
           edf Est.rank Chi.sq p-value
s(Number)   3        3  8.898  0.0307 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.101   Deviance explained = 12.5%
UBRE score = 0.075202  Scale est. = 1         n = 81
 > kyp2$deviance
[1] 72.85893
 > kyp2$null.deviance
[1] 83.23447
 > kyp2$df.null
[1] 80
 > kyp2$df.residual
[1] 77

How can we explain this huge difference?

Denis

> Le 05-09-29 à 06:00, r-help-request at stat.math.ethz.ch a écrit :
>
> De : Henric Nilsson <henric.nilsson at statisticon.se>
> Date : 29 septembre 2005 03:55:19 HAE
> À : ym at climpact.com
> Cc : r-help at stat.math.ethz.ch
> Objet : Rép : [R] p-level in packages mgcv and gam
> Répondre à : Henric Nilsson <henric.nilsson at statisticon.se>
>
>
> Yves Magliulo said the following on 2005-09-28 17:05:
>
>> hi,
>> i'll try to help you, i send a mail about this subject last  
>> week... and
>> i did not have any response...
>> I'm using gam from package mgcv. 1)
>> How to interpret the significance of smooth terms is hard for me to
>> understand perfectly : using UBRE, you fix df. p-value are  
>> estimated by chi-sq distribution using GCV, the best df are  
>> estimated by GAM. (that's what i want) and
>> p-values
>
>
> This is not correct. The df are estimated in both cases (i.e. UBRE  
> and GCV), but the scale parameter is fixed in the UBRE case. Hence,  
> by default UBRE is used for family = binomial or poisson since the  
> scale parameter is assumed to be 1. Similarly, GCV is the default  
> for family = gaussian since we most often want the scale (usually  
> denoted sigma^2) to be estimated.
>
>
>> are estimated by an F distribution But in that case they said "use at
>> your own risk" in ?summary.gam
>
>
> The warning applies in both cases. The p-values are conditional on  
> the smoothing parameters, and the uncertainty of the smooths is not  
> taken into account when computing the p-values.
>
>
>> so you can also look at the chi.sq : but i don't know how to choose a
>
>
> No...
>
>
>> criterion like for p-values... for me, chi.sq show the best  
>> predictor in
>> a model, but it's hard to reject one with it.
>
>
> Which version of mgcv do you use? The confusion probably stems from  
> earlier versions of mgcv (< 1.3-5): the summary and anova methods  
> used to have a column denoted Chi.sq even when the displayed  
> statistic was computed as F. Recent versions of mgcv has
>
> > summary(b)
>
> Family: gaussian
> Link function: identity
>
> Formula:
> y ~ s(x0) + s(x1) + s(x2) + s(x3)
>
> Parametric coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)   7.9150     0.1049   75.44   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Approximate significance of smooth terms:
>         edf Est.rank      F  p-value
> s(x0) 5.173    9.000  3.785 0.000137 ***
> s(x1) 2.357    9.000 34.631  < 2e-16 ***
> s(x2) 8.517    9.000 84.694  < 2e-16 ***
> s(x3) 1.000    1.000  0.444 0.505797
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> R-sq.(adj) =  0.726   Deviance explained = 73.7%
> GCV score =  4.611   Scale est. = 4.4029    n = 400
>
>
> If we assume that the scale is known and fixed at 4.4029, we get
>
> > summary(b, dispersion = 4.4029)
>
> Family: gaussian
> Link function: identity
>
> Formula:
> y ~ s(x0) + s(x1) + s(x2) + s(x3)
>
> Parametric coefficients:
>             Estimate Std. Error z value Pr(>|z|)
> (Intercept)   7.9150     0.1049   75.44   <2e-16 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Approximate significance of smooth terms:
>         edf Est.rank  Chi.sq p-value
> s(x0) 5.173    9.000  34.067 8.7e-05 ***
> s(x1) 2.357    9.000 311.679 < 2e-16 ***
> s(x2) 8.517    9.000 762.255 < 2e-16 ***
> s(x3) 1.000    1.000   0.444   0.505
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> R-sq.(adj) =  0.726   Deviance explained = 73.7%
> GCV score =  4.611   Scale est. = 4.4029    n = 400
>
> Note that t/F changed into z/Chi.sq.
>
>
>> so as far as i m concerned, i use GCV methods, and fix a 5% on the  
>> null
>> hypothesis (pvalue) to select significant predictor. after, i look  
>> at my
>> smooth, and if the parametrization look fine to me, i validate.
>> generaly, for p-values smaller than 0.001, you can be confident. over
>> 0.001, you have to check. 2)
>> for difference between package gam and mgcv, i sent a mail about this
>
>
> The underlying algorithms are very different.
>
>
> HTH,
> Henric
>
>
> De : "Liaw, Andy" <andy_liaw at merck.com>
> Date : 28 septembre 2005 14:01:25 HAE
> À : 'Denis Chabot' <chabotd at globetrotter.net>, Peter Dalgaard  
> <p.dalgaard at biostat.ku.dk>
> Cc : Thomas Lumley <tlumley at u.washington.edu>, R list <r- 
> help at stat.math.ethz.ch>
> Objet : RE: [R] p-level in packages mgcv and gam
>
> Just change the df in what Thomas described to the degree of  
> polynomial, and
> everything he said still applies.  Any good book on regression that  
> covers
> polynomial regression ought to point this out.
>
> Andy
>
>
>> From: Denis Chabot
>>
>> But what about another analogy, that of polynomials? You may not be
>> sure what degree polynomial to use, and you have not decided before
>> analysing your data. You fit different polynomials to your data,
>> checking if added degrees increase r2 sufficiently by doing F-tests.
>>
>> I thought it was the same thing with GAMs. You can fit a
>> model with 4
>> df, and in some cases it is of interest to see if this is a better
>> fit than a linear fit. But why can't you also check if 7df is better
>> than 4df? And if you used mgcv first and it tells you that 7df is
>> better than 4df, why bother repeating the comparison 7df
>> against 4df,
>> why not just take the p-value for the model with 7df (fixed)?
>>
>> Denis
>

> De : Peter Dalgaard <p.dalgaard at biostat.ku.dk>
> Date : 28 septembre 2005 12:04:58 HAE
> À : Thomas Lumley <tlumley at u.washington.edu>
> Cc : Denis Chabot <chabotd at globetrotter.net>, R list <r- 
> help at stat.math.ethz.ch>
> Objet : Rép : [R] p-level in packages mgcv and gam
>
> Thomas Lumley <tlumley at u.washington.edu> writes:
>>
>> Bob, on the other hand, chooses the amount of smoothing depending on
>> the data. When a 4 df smooth fits best he ends up with the same model
>> as Alice and the same p-value.  When some other df fits best he ends
>> up with a different model and a *smaller* p-value than Alice.
>
> This doesn't actually follow, unless the p-value (directly or
> indirectly) found its way into the definition of "best fit". It does
> show the danger, though.
>
> -- 
>    O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
>   c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
>  (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45)  
> 35327918
> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45)  
> 35327907
>

> De : Thomas Lumley <tlumley at u.washington.edu>
> Date : 26 septembre 2005 12:54:43 HAE
> À : Denis Chabot <chabotd at globetrotter.net>
> Cc : r-help at stat.math.ethz.ch
> Objet : Rép : [R] p-level in packages mgcv and gam
>
> On Mon, 26 Sep 2005, Denis Chabot wrote:
>
> But the mgcv manual warns that p-level for the smooth can be
> underestimated when df are estimated by the model. Most of the time
> my p-levels are so small that even doubling them would not result in
> a value close to the P=0.05 threshold, but I have one case with  
> P=0.033.
>
> I thought, probably naively, that running a second model with fixed
> df, using the value of df found in the first model. I could not
> achieve this with mgcv: its gam function does not seem to accept
> fractional values of df (in my case 8.377).
>
> No, this won't work.  The problem is the usual one with model  
> selection: the p-value is calculated as if the df had been fixed,  
> when really it was estimated.
>
> It is likely to be quite hard to get an honest p-value out of  
> something that does adaptive smoothing.
>
>     -thomas
> De : Thomas Lumley <tlumley at u.washington.edu>
> Date : 28 septembre 2005 11:33:27 HAE
> À : Denis Chabot <chabotd at globetrotter.net>
> Cc : R list <r-help at stat.math.ethz.ch>
> Objet : Rép : [R] p-level in packages mgcv and gam
>
> On Wed, 28 Sep 2005, Denis Chabot wrote:
>
> I only got one reply to my message:
>
> No, this won't work.  The problem is the usual one with model
> selection: the p-value is calculated as if the df had been fixed,
> when really it was estimated.
>
> It is likely to be quite hard to get an honest p-value out of
> something that does adaptive smoothing.
>
>     -thomas
>
> I do not understand this: it seems that a lot of people chose df=4
> for no particular reason, but p-levels are correct. If instead I
> choose df=8 because a previous model has estimated this to be an
> optimal df, P-levels are no good because df are estimated?
>
> Yes. I know this sounds strange initially, but it really does make  
> sense if you think about it carefully.
>
> Suppose that Alice and Bob are kyphosis researchers, and that Alice  
> always chooses 4df for smoothing Age.  We would all agree that her  
> p-values are correct [in fact we wouldn't, but that is a separate  
> issue]
>
> Bob, on the other hand, chooses the amount of smoothing depending  
> on the data. When a 4 df smooth fits best he ends up with the same  
> model as Alice and the same p-value.  When some other df fits best  
> he ends up with a different model and a *smaller* p-value than Alice.
>
> In particular, this is still true under the null hypothesis that  
> Age has no effect [If Alice and Bob are interested in p-values, the  
> null hypothesis must be plausible.]
>
> If Bob's p-values are always less than or equal to Alice's p-values  
> under the null hypothesis, and Alice's p-values are less than 0.05  
> 5% of the time, then Bob's p-values are less than 0.05 more than 5%  
> of the time.
>
>
>     -thomas
>
>
> Furthermore, shouldn't packages gam and mgcv give similar results
> when the same data and df are used? I tried this:
>
> library(gam)
> data(kyphosis)
> kyp1 <- gam(Kyphosis ~ s(Age, 4), family=binomial, data=kyphosis)
> kyp2 <- gam(Kyphosis ~ s(Number, 4), family=binomial, data=kyphosis)
> kyp3 <- gam(Kyphosis ~ s(Start, 4), family=binomial, data=kyphosis)
> anova.gam(kyp1)
> anova.gam(kyp2)
> anova.gam(kyp3)
>
> detach(package:gam)
> library(mgcv)
> kyp4 <- gam(Kyphosis ~ s(Age, k=4, fx=T),  family=binomial,
> data=kyphosis)
> kyp5 <- gam(Kyphosis ~ s(Number, k=4, fx=T),  family=binomial,
> data=kyphosis)
> kyp6 <- gam(Kyphosis ~ s(Start, k=4, fx=T),  family=binomial,
> data=kyphosis)
> anova.gam(kyp4)
> anova.gam(kyp5)
> anova.gam(kyp6)
>
>
> P levels for these models, by pair
>
> kyp1 vs kyp4: p= 0.083 and 0.068 respectively (not too bad)
> kyp2 vs kyp5: p= 0.445 and 0.03 (wow!)
> kyp3 vs kyp6: p= 0.053 and 0.008 (wow again)
>
> Also if you plot all these you find that the mgcv plots are smoother
> than the gam plots, even the same df are used all the time.
>
> I am really confused now!
>
> Denis
>

>
> Thomas Lumley            Assoc. Professor, Biostatistics
> tlumley at u.washington.edu    University of Washington, SeattleDe :  
> Thomas Lumley <tlumley at u.washington.edu>
> Date : 28 septembre 2005 14:35:26 HAE
> À : Denis Chabot <chabotd at globetrotter.net>
> Cc : Peter Dalgaard <p.dalgaard at biostat.ku.dk>, R list <r- 
> help at stat.math.ethz.ch>
> Objet : Rép : [R] p-level in packages mgcv and gam
>
> On Wed, 28 Sep 2005, Denis Chabot wrote:
>
> But what about another analogy, that of polynomials? You may not be  
> sure what degree polynomial to use, and you have not decided before  
> analysing your data. You fit different polynomials to your data,  
> checking if added degrees increase r2 sufficiently by doing F-tests.
>
> Yes, you can. And this procedure gives you incorrect p-values.
>
>  They may not be very incorrect -- it depends on how much model  
> selection you do, and how strongly the feature you are selecting on  
> is related to the one you are testing.
>
> For example, using step() to choose a polynomial in x even when x  
> is unrelated to y and z inflates the Type I error rate by giving a  
> biased estimate of the residual mean squared error:
>
> once<-function(){
>   y<-rnorm(50);x<-runif(50);z<-rep(0:1,25)
>   summary(step(lm(y~z),
>         scope=list(lower=~z,upper=~z+x+I(x^2)+I(x^3)+I(x^4)),
>         trace=0))$coef["z",4]
>  }
> p<-replicate(1000,once())
> mean(p<0.05)
> [1] 0.072
>
> which is significantly higher than you would expect for an honest  
> level 0.05 test.
>
>     -thomas