[R] p-level in packages mgcv and gam

Wed Sep 28 17:17:37 CEST 2005

Hi Yves,
Le 05-09-28 à 11:05, Yves Magliulo a écrit :

> hi,
>
> i'll try to help you, i send a mail about this subject last week...  
> and i did not have any response...
Sorry, I did not see your message last week.
>
> I'm using gam from package mgcv.
>
> 1)
> How to interpret the significance of smooth terms is hard for me to  
> understand perfectly :
> using UBRE, you fix df. p-value are estimated by chi-sq distribution
> using GCV, the best df are estimated by GAM. (that's what i want)  
> and p-values
> are estimated by an F distribution But in that case they said "use  
> at your own risk" in ?summary.gam
>
> so you can also look at the chi.sq : but i don't know how to choose  
> a criterion like for p-values... for me, chi.sq show the best  
> predictor in a model, but it's hard to reject one with it.
>
> so as far as i m concerned, i use GCV methods, and fix a 5% on the  
> null hypothesis (pvalue) to select significant predictor. after, i  
> look at my smooth, and if the parametrization look fine to me, i  
> validate.
>
> generaly, for p-values smaller than 0.001, you can be confident.  
> over 0.001, you have to check.
>
I think I follow you, but how do you "validate"? My fit goes very  
nicely in the middle of the data points and appears fine. In most  
cases p is way smaller than 0.001. I have one case that is bimodal in  
shape and more noisy, and p is only 0.03. How do I validate it, how  
do I check?

> 2)
> for difference between package gam and mgcv, i sent a mail about  
> this one year ago, here's the response :
>
> "
> - package gam is based very closely on the GAM approach presented in
> Hastie and Tibshirani's  "Generalized Additive Models" book.  
> Estimation is
> by back-fitting and model selection is based on step-wise regression
> methods based on approximate distributional results. A particular  
> strength
> of this approach is that local regression smoothers (`lo()' terms)  
> can be
> included in GAM models.
>
> - gam in package mgcv represents GAMs using penalized regression  
> splines.
> Estimation is by direct penalized likelihood maximization with
> integrated smoothness estimation via GCV or related criteria (there is
> also an alternative `gamm' function based on a mixed model approach).
> Strengths of the this approach are that s() terms can be functions  
> of more
> than one variable and that tensor product smooths are available via  
> te()
> terms - these are useful when different degrees of smoothness are
> appropriate relative to different arguments of a smooth.
>
> (...)
>
> Basically, if you want integrated smoothness selection, an underlying
> parametric representation, or want smooth interactions in your models
> then mgcv is probably worth a try (but I would say that). If you  
> want to
> use local regression smoothers and/or prefer the stepwise selection
> approach then package gam is for you.
> "
>
It is hard to evaluate the explanations based on the algorithm used  
to fit the data, but it seems to me that the answer, in terms of  
significance of the smooth, should be at least very similar.  
Otherwise, what do you do when an author cites one package? You  
wonder if the fit would have been significant using the other package?

> i think the difference of p-values between :gam and :mgcv, is  
> because you don't have same number of step iteration. mgcv : gam  
> choose the number of step and with gam : gam you have to choose it..
>
> hope it helps and someone gives us more details...
>
> Yves
Again, I can see that the p-values could differ a bit considering the  
differences between the 2 packages. But when the differences are huge  
and result in contradictory conclusions, I have a problem. Like you I  
hope more help is forthcoming.

Denis