[Rd] New vcov(*, complete=TRUE) etc -- coef(<lm>) vs coef(<aov>)

Thu Nov 9 11:27:01 CET 2017

>>>>> Fox, John <jfox at mcmaster.ca>
>>>>>     on Tue, 7 Nov 2017 22:09:03 +0000 writes:

    > Dear Martin, I think that your plan makes sense. It's too
    > bad that aov() behaved differently in this respect from
    > lm(), and thus created more work, but it's not be a bad
    > thing that the difference is now explicit and documented.

    > I expect that that other problems like this will surface,
    > particularly with contributed packages (and I know that
    > you're aware that this has already happened with the car
    > package). That is, packages that made provision for
    > aliased coefficients based on the old behaviour of coef()
    > and vcov() will now have to adapt to the new, more
    > consistent behaviour.

    > Best, John

Thank you John for the confirmation (and see below).

    >> -----Original Message-----
    >> >>>>> Martin Maechler <maechler at stat.math.ethz.ch>
    >> >>>>>     on Thu, 2 Nov 2017 21:59:00 +0100 writes:
    >> 
    >> >>>>> Fox, John <jfox at mcmaster.ca>
    >> >>>>>     on Thu, 14 Sep 2017 13:46:44 +0000 writes:
    >> 
    >> >> Dear Martin, I made three points which likely got lost
    >> >> because of the way I presented them:
    >> 
    >> >> (1) Singularity is an unusual situation and should be
    >> >> made more prominent. It typically reflects a problem with
    >> >> the data or the specification of the model. That's not to
    >> >> say that it *never* makes sense to allow singular fits
    >> >> (as in the situations you mentions).
    >> 
    >> >> I'd favour setting singular.ok=FALSE as the default, but
    >> >> in the absence of that a warning or at least a note. A
    >> >> compromise would be to have a singular.ok option() that
    >> >> would be FALSE out of the box.
    >> 
    >> >> Any changes would have to be made very carefully so as
    >> >> not to create chaos.
    >> 
    >> > I for one, am too reluctant to want to change the default
    >> > there.
    >> 
    >> >> That goes for the points below as well.
    >> 
    >> >> (2) coef() and vcov() behave inconsistently, which can be
    >> >> problematic because one often uses them together in code.
    >> 
    >> > indeed; and I had agreed on that.  As of today, in R-devel
    >> > only they now behave compatibly.  NEWS entry
    >> 
    >> >     • The “default” ("lm" etc) methods of vcov() have
    >> > gained new optional argument complete = TRUE which makes
    >> > the vcov() methods more consistent with the coef() methods
    >> > in the case of singular designs.  The former behavior is
    >> > now achieved by vcov(*, complete=FALSE).
    >> 
    >> 
    >> >> (3) As you noticed in your second message, lm() has a
    >> >> singular.ok argument and glm() doesn't.
    >> 
    >> > and that has been amended even earlier (a bit more than a
    >> > month ago) in R-devel svn rev 73380 with NEWS entry
    >> 
    >> >     • glm() and glm.fit get the same singular.ok=TRUE
    >> > argument that lm() has had forever.  As a consequence, in
    >> > glm(*, method = <your_own>), user specified methods need
    >> > to accept a singular.ok argument as well.
    >> 
    >> >> I'll take a look at the code for glm() with an eye
    >> >> towards creating a patch, but I'm a bit reluctant to mess
    >> >> with the code for something as important as glm().
    >> 
    >> > and as a matter of fact you did send me +- the R code part
    >> > of that change.
    >> 
    >> > My current plan is to also add the 'complete = TRUE'
    >> > option to the "basic" coef() methods, such that you also
    >> > have consistent coef(*, complete=FALSE) and vcov(*,
    >> > complete=FALSE) behaviors.
    >> 
    >> and indeed I had added the above a bit later.
    >> 
    >> However, to my surprise, I have now found that we have a
    >> coef.aov() method -- completely undocumented which behaves *differently*:
    >> 
    >> where as the default coef() method which is called for lm(..) results gives *all*
    >> coefficients, and gives  NA  for "aliased" ones, the aov method *drops* the  NA
    >> coefficients  and has done so "forever"  (I've checked R version 1.1.1 of April 14,
    >> 2000).
    >> 
    >> vcov() on the other hand has not had a special "aov" method, but treats aov()
    >> and lm() results the same... which means that in R-devel the vcov() method for
    >> an aov() object  uses 'complete=TRUE' and gives NA rows and columns for the
    >> aliased coefficients, whereas  coef.aov()  removes all the NAs  and  gives only
    >> the
    >> "non-aliased" coefficients.   Consequently, in R-devel,
    >> vcov(<aov>) and coef(<aov>)  are *now* incoherent, whereas these two
    >> *were* coherent before the change.

    >> I propose to

    >> 1. continue the strategy to keep coef() back-compatible and
    >> 2. to *document* the "surprising" behavior of coef.aov() 
    >> 3. introduce a  vcov.aov()  with complete=FALSE  default
    >> behavior which is compatile to the coef.aov() one [where I'd
    >> also introduce the no-change  'complete=FALSE' argument].

I have now committed the above proposal to R-devel,
svn rev 73692.

This does revert  vcov(<aov>)  default behavior in R-devel to
the R <= 3.4.x behavior...
so  an effect in package-space should rather be beneficial.

Martin Maechler
ETH Zurich