[R] p-level in packages mgcv and gam

Thomas Lumley tlumley at u.washington.edu
Wed Sep 28 20:35:26 CEST 2005


On Wed, 28 Sep 2005, Denis Chabot wrote:

> But what about another analogy, that of polynomials? You may not be sure what 
> degree polynomial to use, and you have not decided before analysing your 
> data. You fit different polynomials to your data, checking if added degrees 
> increase r2 sufficiently by doing F-tests.

Yes, you can. And this procedure gives you incorrect p-values.

  They may not be very incorrect -- it depends on how much model selection 
you do, and how strongly the feature you are selecting on is related to 
the one you are testing.

For example, using step() to choose a polynomial in x even when x is 
unrelated to y and z inflates the Type I error rate by giving a biased 
estimate of the residual mean squared error:

once<-function(){
   y<-rnorm(50);x<-runif(50);z<-rep(0:1,25)
   summary(step(lm(y~z),
         scope=list(lower=~z,upper=~z+x+I(x^2)+I(x^3)+I(x^4)),
         trace=0))$coef["z",4]
  }
> p<-replicate(1000,once())
> mean(p<0.05)
[1] 0.072

which is significantly higher than you would expect for an honest level 
0.05 test.

 	-thomas




More information about the R-help mailing list