[R] About stepwise regression problem

Frank Harrell f.harrell at vanderbilt.edu
Fri Oct 7 14:53:37 CEST 2011


Removing variables because of high P-values is not a valid procedure.  Use of
AIC or BIC is just a restatement of P-values.  AIC can be quite useful if
you have posited a very small number of fully pre-specified models (e.g., 2
or 3) and want to choose between them.  Stepwise variable selection without
shrinkage is invalid.
Frank

pigpigmeow wrote:
> 
> chris,
> I'm not using lmer, i just use gam mixed with smoothing function and
> linear function
> and summary of the model, it shows
> Family: gaussian 
> Link function: log 
> 
> Formula: 
> newNO2 ~ pressure + s(maxtemp, bs = "cr") + s(avetemp, bs = "cr") + 
>     s(mintemp, bs = "cr") + RH + s(solar, bs = "cr") + s(windspeed, 
>     bs = "cr") + s(transport, bs = "cr") 
> 
> Parametric coefficients: 
>             Estimate Std. Error t value Pr(>|t|)     
> (Intercept) 2.721513   0.049108  55.419   <2e-16 *** 
> pressure    0.028988   0.019434   1.492    0.140     
> RH          0.005228   0.009763   0.535    0.594     
> --- 
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> Approximate significance of smooth terms: 
>                edf Ref.df     F p-value   
> s(maxtemp)   6.346  7.276 1.223 0.29991   
> s(avetemp)   1.000  1.000 0.226 0.63562   
> s(mintemp)   1.908  2.396 1.066 0.35871   
> s(solar)     3.797  4.490 2.164 0.07359 . 
> s(windspeed) 5.305  6.341 2.346 0.03648 * 
> s(transport) 7.234  7.984 2.807 0.00884 ** 
> --- 
> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
> 
> R-sq.(adj) =  0.307   Deviance explained = 49.1% 
> GCV score = 61.136  Scale est. = 44.49     n = 105 
> 
> In the parametric  coefficients part, I see that Pr(>|t|)     which mean
> the probablity greater than T-value. Is that probablity mean p-value?
> In the Approximate significance of smooth terms part,  p-value column
> shows the probability greater than F-value. 
> 
> I have the following question,
> 1.if I reject the variable term which has greater the p-value no matter
> the variable term is smoothing term or linear term, is it correct to
> perform stepwise regression.
>  2. In my model
> noxd<-gam(newNOX~pressure+maxtemp+s(avetemp,bs="cr")+s(mintemp,bs="cr")+s(RH,bs="cr")+s(solar,bs="cr")+s(windspeed,bs="cr")+s(transport,bs="cr"),family=gaussian
> (link=log),groupD,methods=REML) , is it generalized additive mixed model?
> 3. what the different if I use other criteria such as AIC or BIC? 
> 
> Anyway, thank all of you!
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: http://r.789695.n4.nabble.com/About-stepwise-regression-problem-tp3870217p3882092.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list