[R] p-level in packages mgcv and gam

Mon Sep 26 18:25:04 CEST 2005

Hi,

I am fairly new to GAM and started using package mgcv. I like the  
fact that optimal smoothing is automatically used (i.e. df are not  
determined a priori but calculated by the gam procedure).

But the mgcv manual warns that p-level for the smooth can be  
underestimated when df are estimated by the model. Most of the time  
my p-levels are so small that even doubling them would not result in  
a value close to the P=0.05 threshold, but I have one case with P=0.033.

I thought, probably naively, that running a second model with fixed  
df, using the value of df found in the first model. I could not  
achieve this with mgcv: its gam function does not seem to accept  
fractional values of df (in my case 8.377).

So I used the gam package and fixed df to 8.377. The P-value I  
obtained was slightly larger than with mgcv (0.03655 instead of  
0.03328), but it is still < 0.05.

Was this a correct way to get around the "underestimated P-level"?

Furthermore, although the gam.check function of the mgcv package  
suggests to me that the gaussian family (and identity link) are  
adequate for my data, I must say the instructions in R help for  
"family" and in Hastie, T. and Tibshirani, R. (1990) Generalized  
Additive Models are too technical for me. If someone knows a  
reference that explains how to choose model and link, i.e. what tests  
to run on your data before choosing, I would really appreciate it.

Thanks in advance,

Denis Chabot