[R] p-level in packages mgcv and gam
Thomas Lumley
tlumley at u.washington.edu
Wed Sep 28 20:35:26 CEST 2005
On Wed, 28 Sep 2005, Denis Chabot wrote:
> But what about another analogy, that of polynomials? You may not be sure what
> degree polynomial to use, and you have not decided before analysing your
> data. You fit different polynomials to your data, checking if added degrees
> increase r2 sufficiently by doing F-tests.
Yes, you can. And this procedure gives you incorrect p-values.
They may not be very incorrect -- it depends on how much model selection
you do, and how strongly the feature you are selecting on is related to
the one you are testing.
For example, using step() to choose a polynomial in x even when x is
unrelated to y and z inflates the Type I error rate by giving a biased
estimate of the residual mean squared error:
once<-function(){
y<-rnorm(50);x<-runif(50);z<-rep(0:1,25)
summary(step(lm(y~z),
scope=list(lower=~z,upper=~z+x+I(x^2)+I(x^3)+I(x^4)),
trace=0))$coef["z",4]
}
> p<-replicate(1000,once())
> mean(p<0.05)
[1] 0.072
which is significantly higher than you would expect for an honest level
0.05 test.
-thomas
More information about the R-help
mailing list