[Rd] Some "lm" methods give wrong results when applied to "mlm" objects

Tue Apr 4 10:24:18 CEST 2017

I had a look at some influence measures, and it seems to me that currently several methods handle multiple lm (mlm) objects wrongly in R. In some cases there are separate "mlm" methods, but usually "mlm" objects are handled by the same methods as univariate "lm" methods, and in some cases this fails.

There are two general patterns of problems in influence measures:

1) The univariate methods assume that overall standard deviation (sd) is of length one, but for "mlm" models we have a multivariate response with a multicolumn residual matrix. The functions also get correctly the sd vector corresponding to the columns, but it is not applied to these, but recycled for rows. This influences rstandard.lm and cooks.distance.lm. For instance, in cooks.distance.lm we have ((res/(sd * (1 - hat)))^2 * hat)/p, where res is a n x m matrix, sd is a m-vector and hat is a n-vector).  Both of these functions are very easily fixed.

2) Another problem is that several functions are based on lm.influence function, and it seems that it returns elements sigma and coefficients that are only based on the first variable (first column of the residual matrix wt.res) and give wrong results for other variables. This will influence functions dfbeta.lm (coefficients), dfbetas.lm (coefficients, sigma), dffits (sigma), rstudent.lm (sigma) and covratio (sigma). lm.influence finds these elements in compiled code and this is harder to fix. MASS (the book & the package) avoid using compiled code in their (univariate) studentized residuals, and instead use a clever short-cut.

In addition to these, there are a couple of other cases which seem to fail with "mlm" models: 

confint.lm gives empty result, because the length of the results is defined by names(coef(object)) which is NULL because "mlm" objects return a matrix of coefficients instead of a vector with names.

dummy.coef fails because "mlm" objects do not have xlevels item.

extractAIC.lm returns only one value instead of a vector, and edf is misleading. Separate deviance.mlm returns a vector of deviances, and logLik.lm returns "'logLik.lm' does not support multiple responses". Probably extractAIC.lm should work like logLik.lm.

Several methods already handle "mlm" methods by returning message "xxxx  is not yet implemented for multivariate lm()" which of course is a natural and correct solution to the problems.

Cheers, Jari Oksanen