[Rd] aic() component in GLM-family objects

Ben Bolker bbolker @ending from gm@il@com
Sun Jun 17 17:40:38 CEST 2018


FWIW p. 206 of the White Book gives the following for
names(binomial()): family, names, link, inverse, deriv, initialize,
variance, deviance, weight.

  So $aic wasn't there In The Beginning.  I haven't done any more
archaeology to try to figure out when/by whom it was first introduced
...

 Section 6.3.3, on extending families, doesn't give any other relevant info.

A patch for src/library/stats/man/family.Rd below: please check what
I've said about $aic and $mu.eta to make sure they're correct!  I can
submit this to the r bug list if preferred.

----
Index: family.Rd
===================================================================
--- family.Rd    (revision 74904)
+++ family.Rd    (working copy)
@@ -31,7 +31,7 @@
 \arguments{
   \item{link}{a specification for the model link function.  This can be
     a name/expression, a literal character string, a length-one character
-    vector or an object of class
+    vector, or an object of class
     \code{"\link[=make.link]{link-glm}"} (such as generated by
     \code{\link{make.link}}) provided it is not specified
     \emph{via} one of the standard names given next.
@@ -45,7 +45,7 @@
     the \code{Gamma} family the links \code{inverse}, \code{identity}
      and \code{log};
     the \code{poisson} family the links \code{log}, \code{identity},
-    and \code{sqrt} and the \code{inverse.gaussian} family the links
+    and \code{sqrt}; and the \code{inverse.gaussian} family the links
     \code{1/mu^2}, \code{inverse}, \code{identity}
     and \code{log}.

@@ -105,8 +105,8 @@
 \note{
   The \code{link} and \code{variance} arguments have rather awkward
   semantics for back-compatibility.  The recommended way is to supply
-  them is as quoted character strings, but they can also be supplied
-  unquoted (as names or expressions).  In addition, they can also be
+  them as quoted character strings, but they can also be supplied
+  unquoted (as names or expressions).  Additionally, they can be
   supplied as a length-one character vector giving the name of one of
   the options, or as a list (for \code{link}, of class
   \code{"link-glm"}).  The restrictions apply only to links given as
@@ -130,10 +130,18 @@
   \item{dev.resids}{function giving the deviance residuals as a function
     of \code{(y, mu, wt)}.}
   \item{aic}{function giving the AIC value if appropriate (but \code{NA}
-    for the quasi- families).  See \code{\link{logLik}} for the assumptions
-      made about the dispersion parameter.}
-  \item{mu.eta}{function: derivative \code{function(eta)}
-    \eqn{d\mu/d\eta}.}
+    for the quasi- families).  More precisely, this function
+    returns \eqn{-2 L + 2 s}, where \eqn{L} is the log-likelihood and \eqn{s}
+    is the number of estimated scale parameters; the penalty term for the
+    location parameters is added elsewhere.
+    See \code{\link{logLik}} for the assumptions
+    made about the dispersion parameter.}
+  \item{mu.eta}{function: derivative of the inverse-link function
+    with respect to the linear predictor. If the inverse-link
+    function is \eqn{\mu=g^{-1}(\eta)}{mu=ginv(eta)}
+    where \eqn{eta}{\eta} is the value
+    of the linear predictor, then this function returns
+    \eqn{d(g^{-1})/d\eta=d\mu/d\eta}{d(ginv(eta))/d(eta)=d(mu)/d(eta)}.}
   \item{initialize}{expression.  This needs to set up whatever data
     objects are needed for the family as well as \code{n} (needed for
     AIC in the binomial family) and \code{mustart} (see \code{\link{glm}}).}
@@ -224,8 +232,8 @@
 ## which case use an offset of 0 in the corresponding formula
 ## to get the null deviance right.

-## Binomial with identity link: often not a good idea.
-\dontrun{binomial(link = make.link("identity"))}
+## Binomial with identity link: often computationally and
conceptually difficult
+\dontrun{binomial(link = "identity")}

 ## tests of quasi
 x <- rnorm(100)
@@ -236,7 +244,7 @@
 glm(y ~ x, family = quasi(variance = "mu^2", link = "log"))
 \dontrun{glm(y ~ x, family = quasi(variance = "mu^3", link = "log")) # fails}
 y <- rbinom(100, 1, plogis(x))
-# needs to set a starting value for the next fit
+# need to set a starting value for the next fit
 glm(y ~ x, family = quasi(variance = "mu(1-mu)", link = "logit"),
start = c(0,1))
 }
 \keyword{models}

On Mon, Jun 4, 2018 at 10:46 AM, Martin Maechler
<maechler using stat.math.ethz.ch> wrote:
>>>>>> Ben Bolker
>>>>>>     on Sun, 3 Jun 2018 17:33:18 -0400 writes:
>
>     > Is it generally known/has it been previously discussed here that the
>     > $aic() component in GLM-family objects (e.g. results of binomial(),
>     > poisson(), etc.) does not as implemented actually return the AIC, but
>     > rather -2*log-likelihood + 2*(model_has_scale_parameter) ?
>
> This rings a faint bell from the last millennium with me,
> and the following "fortune"  may contain the answer implicitly :
>
> ------------------------------------------------------------
>   > if(!require("fortunes")) install.packages("fortunes")
>   > fortune("bug compatib")
>
>   For quite a while, bug-for-bug compatibility with S-PLUS v 3.x was considered
>   important to allow people to port their packages between systems.
>      -- Peter Dalgaard
>         R-help (February 2009)
>   >
> ------------------------------------------------------------
>
> Ideally, readers who still have access to a version of S-PLUS / S+
> or who have read and internalized or (even co-written !)
> "The white book", notably Ch.6, may be able to shed a historic
> light on this.
>
> I note that the white book's Appendix B with function help
> pages, has a page ?family.object,  accessible here
>    https://sites.oxy.edu/lengyel/M150/Sueselbeck/helpfiles/family.object.html
>
> which does *not* mention a  <fam>$dev.resid() component, but instead
> allows to use  <fam>$residuals(*, residuals=TRUE)
> get the
> "
>   vector of deviance residual, whose weighted sum of
>   squares is the deviance
> "
>
> Given the above, and also the ?glm entry
>
> || Author(s):
> ||
> ||      The original R implementation of ‘glm’ was written by Simon Davies
> ||      working for Ross Ihaka at the University of Auckland, but has
> ||      since been extensively re-written by members of the R Core team.
> ||
> ||      The design was inspired by the S function of the same name
> ||      described in Hastie & Pregibon (1992).
>
> actually suggest that it may be hard nowadays to find the
> original "design specs" that Simon and Ross had used at the
> time, and also that they only were _inspired_ by the white book
> chapter 6 (= Hastie & Pregibon (1992)).
>
>
>
>     > Can anyone in this forum gauge how a documentation patch
>     > would be received?
>
> It depends on further answers to your questions (i.e, this
> thread), but I'd currently say  "gratefully".
> I'd expect it would be a patch mainly to
>   src/library/stats/man/family.Rd
>
> Note that help(AIC) has a non-small 'Details' section, but
> indeed it does not mention the family(*)$aic function.
>
>     > This behaviour does not seem to be documented in ?family (or anywhere
>     > else I can find), which says:
>
>     > aic: function giving the AIC value if appropriate (but ‘NA’ for
>     > the quasi- families).  See ‘logLik’ for the assumptions made
>     > about the dispersion parameter.
>
>
>     > For a demonstration that e.g. binomial()$aic() is really -2*log L and
>     > not the AIC, see:
>
>     > https://github.com/wch/r-source/blob/trunk/src/library/stats/R/family.R#L317
>
>     > This document
>     > <https://github.com/lme4/lme4/blob/master/misc/notes/deviance.rmd>
>     > explicates the details a bit more ('L' denotes log-likelihood):
>
>     > * family()$aic computes $-2L$, which glm.fit translates to an AIC by
>     >   adding $2k$ and storing it in model$aic
>     > * logLik.default retrieves model$aic and converts it back to a
>     >   log-likelihood
>     > * stats:::AIC.default retrieves the log-likelihood and converts it
>     >   back to an AIC (!)
>     > * family()$dev.resid() computes the squared deviance residuals
>     > * stats:::residuals.glm retrieves these values and takes the signed
>     >   square root
>
>     > cheers
>     > Ben Bolker



More information about the R-devel mailing list