[R] akaike's information criterion

Thu Sep 13 17:19:33 CEST 2001

Especially if you are going to be doing formal statistical
inference but often even just for prediction,
model uncertainty of all types needs to be taken
into account.  The use of AIC to select from among
a small set of competing models or to select a single
"tuning constant" such as an overall shrinkage or
penalty factor does not cause many problems.  For
what you have suggested, it is possible to be
mislead by unrecognized model uncertainty when
entertaining many models and transformations.
The formula for AIC in many ways assumes that
the model specification was non-stochastic.

See

@ARTICLE{far92cos,
  author = {Faraway, J. J.},
  year = 1992,
  title = {The cost of data analysis},
  journal = J Comp Graphical Stat,
  volume = 1,
  pages = {213-229},
  annote = {bootstrap; validation; predictive accuracy; modeling
strategy;
           regression diagnostics;model uncertainty}
}
and

@ARTICLE{cha95mod,
  author = {Chatfield, C.},
  year = 1995,
  title = {Model uncertainty, data mining and statistical inference
(with
          discussion)},
  journal = JRSSA,
  volume = 158,
  pages = {419-466},
  annote = {bias by selecting model because it fits the data well; bias
in
           standard errors;P. 420: ... need for a better balance in the
           literature and in statistical teaching between {\em
techniques} and
           problem solving {\em strategies}. P. 421: It is `well known'
to be
           `logically unsound and practically misleading' (Zhang, 1992)
to
           make inferences as if a model is known to be true when it
has, in
           fact, been selected from the {\em same} data to be used for
           estimation purposes. However, although statisticians may
admit this
           privately (Breiman (1992) calls it a `quiet scandal'), they
(we)
           continue to ignore the difficulties because it is not clear
what
           else could or should be done. P. 421: Estimation errors for
           regression coefficients are usually smaller than errors from
           failing to take into account model specification. P. 422:
           Statisticians must stop pretending that model uncertainty
does not
           exist and begin to find ways of coping with it. P. 426: It is
           indeed strange that we often admit model uncertainty by
searching
           for a best model but then ignore this uncertainty by making
           inferences and predictions as if certain that the best
fitting
           model is actually true. P. 427: The analyst needs to assess
the
           model selection {\em process} and not just the best fitting
model.
           P. 432: The use of subset selection methods is well known to
           introduce alarming biases. P. 433: ... the AIC can be highly
biased
           in data-driven model selection situations. P. 434: Prediction
           intervals will generally be too narrow. In the discussion,
Jamal R.
           M. Ameen states that a model should be (a) satisfactory in
           performance relative to the stated objective, (b) logically
sound,
           (c) representative, (d) questionable and subject to on-line
           interrogation, (e) able to accommodate external or expert
           information and (f) able to convey information.}
}

Frank Harrell

Thomas Dick wrote:
> 
> Hello all,
> 
> i hope you don't mind my off topic question. i want to use the Akaike criterion
> for variable selection in a regression model. Does anyone know some basic
> literature about that topic?
> 
> Especially I'm interested in answers to the following questions:
> 1. Has (and if so how has) the criterion to be modified, if i estimate the
> transformations of the variables too?
> 
> 2. How is the usage of the criterion if i use dummy variables (for categorical
> data) in the model?
> 
> 3. does the criterion have only one minimum, or may i assume several local
> minima?
> 
> Thank you in advance
> Thomas
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._