[Rd] New vcov(*, complete=TRUE) etc -- coef(<lm>) vs coef(<aov>)

Tue Nov 7 22:48:07 CET 2017

>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>>     on Thu, 2 Nov 2017 21:59:00 +0100 writes:

>>>>> Fox, John <jfox at mcmaster.ca>
>>>>>     on Thu, 14 Sep 2017 13:46:44 +0000 writes:

    >> Dear Martin, I made three points which likely got lost
    >> because of the way I presented them:

    >> (1) Singularity is an unusual situation and should be
    >> made more prominent. It typically reflects a problem with
    >> the data or the specification of the model. That's not to
    >> say that it *never* makes sense to allow singular fits
    >> (as in the situations you mentions).

    >> I'd favour setting singular.ok=FALSE as the default, but
    >> in the absence of that a warning or at least a note. A
    >> compromise would be to have a singular.ok option() that
    >> would be FALSE out of the box.

    >> Any changes would have to be made very carefully so as
    >> not to create chaos.

    > I for one, am too reluctant to want to change the default
    > there.

    >> That goes for the points below as well.

    >> (2) coef() and vcov() behave inconsistently, which can be
    >> problematic because one often uses them together in code.

    > indeed; and I had agreed on that.  As of today, in R-devel
    > only they now behave compatibly.  NEWS entry

    >     • The “default” ("lm" etc) methods of vcov() have
    > gained new optional argument complete = TRUE which makes
    > the vcov() methods more consistent with the coef() methods
    > in the case of singular designs.  The former behavior is
    > now achieved by vcov(*, complete=FALSE).

    >> (3) As you noticed in your second message, lm() has a
    >> singular.ok argument and glm() doesn't.

    > and that has been amended even earlier (a bit more than a
    > month ago) in R-devel svn rev 73380 with NEWS entry

    >     • glm() and glm.fit get the same singular.ok=TRUE
    > argument that lm() has had forever.  As a consequence, in
    > glm(*, method = <your_own>), user specified methods need
    > to accept a singular.ok argument as well.

    >> I'll take a look at the code for glm() with an eye
    >> towards creating a patch, but I'm a bit reluctant to mess
    >> with the code for something as important as glm().

    > and as a matter of fact you did send me +- the R code part
    > of that change.

    > My current plan is to also add the 'complete = TRUE'
    > option to the "basic" coef() methods, such that you also
    > have consistent coef(*, complete=FALSE) and vcov(*,
    > complete=FALSE) behaviors.

and indeed I had added the above a bit later.

However, to my surprise, I have now found that we have a
coef.aov() method -- completely undocumented which behaves *differently*:

where as the default coef() method which is called for lm(..)
results gives *all* coefficients, and gives  NA  for "aliased" ones,
the aov method *drops* the  NA  coefficients  and has done so
"forever"  (I've checked R version 1.1.1 of April 14, 2000).

vcov() on the other hand has not had a special "aov" method, but
treats aov() and lm() results the same... which means that in
R-devel the vcov() method for an aov() object  uses
'complete=TRUE' and gives NA rows and columns for the aliased coefficients,
whereas  coef.aov()  removes all the NAs  and  gives only the
"non-aliased" coefficients.   Consequently, in R-devel,
vcov(<aov>) and coef(<aov>)  are *now* incoherent, whereas these
two *were* coherent before the change.

I propose to
1. continue the strategy to keep coef() back-compatible  and
2. to *document* the "surprising" behavior of coef.aov()
3. introduce a  vcov.aov()  with complete=FALSE  default
   behavior which is compatile to the coef.aov() one [where I'd
   also introduce the no-change  'complete=FALSE' argument].

Hmm... again, this has been more work and more implications than
originally optimistically assumed..

Opinions, caveats, other feedback -- are very welcome!

Martin