[R] FW: logistic regression

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Oct 7 05:21:43 CEST 2008

Kjetil Halvorsen wrote:
> Hola!
> If the original questioner wants a guide as to how variables to measure
> IN THE FUTURE, when using his model in practice, thwen I think he will 
> be unhappy with any advice which forces him to measure each of the 44 
> variables
> when probably a small subset will do!   What is wrong with first using, let
> us say, penalized likelihood, maybe with CV to choose degree of smoothing,
> and SECONDLY using stepwise (maybee stepAIC from MASS) with
> the predicted values from the first step model to get a good 
> few-vatiables approximation which can be used in practice? If my memory 
> is'nt too bad, that
> idea is from harrel's book.

No, I never recommended that.  The probably is the extremely low 
probability that the method will find the "right" variables.  The 
process is unstable, and predicted values do not validate well.  If you 
want parsimony then a unified approach based on an L1 penalty (lasso an 
derivatives) is worth a look.  These methods select variables and 
penalize the remaining variables.  The coefficients will be different 
than had the remaining variables been put into an unpenalized model, 
i.e., the method penalizes for the context of not knowing the right 
variables in advance.

The penalized likelihood step you proposed is a good one but the 
unpenalized stepwise method in the second step runs into problems.


> Kjetil
> On Mon, Sep 29, 2008 at 9:50 PM, Frank E Harrell Jr 
> <f.harrell at vanderbilt.edu <mailto:f.harrell at vanderbilt.edu>> wrote:
>     Greg Snow wrote:
>             -----Original Message-----
>             From: r-help-bounces at r-project.org
>             <mailto:r-help-bounces at r-project.org>
>             [mailto:r-help-bounces at r- <mailto:r-help-bounces at r->
>             project.org <http://project.org>] On Behalf Of Frank E
>             Harrell Jr
>             Sent: Saturday, September 27, 2008 7:15 PM
>             To: Darin Brooks
>             Cc: dieter.menne at menne-biomed.de
>             <mailto:dieter.menne at menne-biomed.de>;
>             r-help at stat.math.ethz.ch <mailto:r-help at stat.math.ethz.ch>;
>             ted.harding at manchester.ac.uk
>             <mailto:ted.harding at manchester.ac.uk>
>             Subject: Re: [R] FW: logistic regression
>             Darin Brooks wrote:
>                 Glad you were amused.
>                 I assume that "booking this as a fortune" means that
>                 this was an
>             idiotic way
>                 to model the data?
>             Dieter was nominating this for the "fortunes" package in R.
>              (Thanks
>             Dieter)
>                 MARS?  Boosted Regression Trees?  Any of these a better
>                 choice to
>             extract
>                 significant predictors (from a list of about 44) for a
>                 measured
>             dependent
>                 variable?
>             Or use a data reduction method (principal components, variable
>             clustering, etc.) or redundancy analysis (to remove individual
>             predictors before examining associations with Y), or fit the
>             full model
>             using penalized maximum likelihood estimation.  lasso and
>             lasso-like
>             methods are also worth pursuing.
>         Frank (and any others who want to share an opinion):
>         What are your thoughts on model averaging as part of the above list?
>     Model averaging has good performance but no advantage over fitting a
>     single complex model using penalized maximum likelihood estimation.
>     Frank
>         --
>         Gregory (Greg) L. Snow Ph.D.
>         Statistical Data Center
>         Intermountain Healthcare
>         greg.snow at imail.org <mailto:greg.snow at imail.org>
>         801.408.8111
>     -- 
>     Frank E Harrell Jr   Professor and Chair           School of Medicine
>                         Department of Biostatistics   Vanderbilt University
>     ______________________________________________
>     R-help at r-project.org <mailto:R-help at r-project.org> mailing list
>     https://stat.ethz.ch/mailman/listinfo/r-help
>     PLEASE do read the posting guide
>     http://www.R-project.org/posting-guide.html
>     and provide commented, minimal, self-contained, reproducible code.

Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

More information about the R-help mailing list