[R] Logistic Regression: variable selection based on p value?

Erik Iverson iverson at biostat.wisc.edu
Thu Dec 4 14:42:45 CET 2008


Puff -

There are many strategies, ideas, and literature on this topic.  A great 
introduction that leads to many of the references that are interesting 
is Frank Harrell's book, "Regression Modeling Strategies".  I would 
highly recommend it.



pufftissue pufftissue wrote:
> Hi,
> 
> When I use logistic regression, each variable has a p value associated with
> it.  Do I only include the variables that have a statistically significant p
> value (<0.05), or are there situations when I should include variables when
> their p values are high?  I had heard that if a variable has a high p value
> but it's not the terminal variable, keep it; otherwise, take it out.  Not
> sure if it's right or even why this is the case.  What about if my p values
> are terrible but this combo of variables yields the highest AUC and
> calibration?  What prevails in this case?
> 
> Thank you!
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list