[R] influence measures for multivariate linear models

Michael Friendly friendly at yorku.ca
Wed Sep 15 00:10:55 CEST 2010


  I'm following up on a question I posted 8/10/2010, but my newsreader 
has lost this thread.

> Barrett & Ling, JASA, 1992, v.87(417), pp184-191 define general 
> classes of influence measures for multivariate
> regression models, including analogs of Cook's D, Andrews & Pregibon 
> COVRATIO, etc.  As in univariate
> response models, these are based on leverage and residuals based on 
> omitting one (or more) observations at
> a time and refitting, although, in the univariate case, the 
> computations can be optimized, as they are in
> stats::influence() and related methods.
>
> I'm interested in exploring the multivariate extension in R.  I tried 
> the following, and was surprised to find that
> R returned a result rather than an error -- presumably because mlm 
> objects are not trapped before they
> get to lm.influence()
>
I've done a bit more testing, comparing what I get from R lm functions 
with some direct calculations in both
R and SAS, and now conclude that there are bugs in lm.influence and the 
coef(update()) methods I tried
before to get leave-one-out coefficients for multivariate response 
models fit with lm().
By bugs, I mean that results returned are silently wrong, or at best, 
misleading.

My test case: a subset of the Rohwer data anayzed by Barrett & Ling (& 
others)

 > library(heplots)
 > data(Rohwer, package="heplots")
 > Rohwer1 <- Rohwer[Rohwer$SES=='Hi',]
 > rohwer.mod <- lm(cbind(SAT, PPVT, Raven) ~ n + s + ns + na + ss, 
data=Rohwer1)
 > (B <- coef(rohwer.mod))
                    SAT       PPVT      Raven
(Intercept) -28.467471 39.6970896 13.2438356
n             3.257132  0.0672825  0.0593474
s             2.996583  0.3699843  0.4924440
ns           -5.859063 -0.3743802 -0.1640219
na            5.666223  1.5230090  0.1189805
ss           -0.622653  0.4101567 -0.1211564

Attempt to get the leave-one-out coefficients, B(-i),  for the first 2 
cases from influence(): *wrong* -- influence() should
probably issue
stop('Not defined for mlm objects'), or the documentation should be made 
clearer for what happens in this
case.

 > head(influence(rohwer.mod, do.coef=TRUE)$coefficients, 2)
          [,1]      [,2]      [,3]     [,4]       [,5]       [,6]
[1,] -5.56830 0.0510135 -0.486096  0.61811  0.0585384 -0.1339219
[2,]  2.30754 0.1092886  0.388349 -0.40717 -0.0600899  0.0477736

The correct result, for the first case is

 > # direct calculation
 > X <- cbind(1, as.matrix(Rohwer1[,6:10]))
 > Y <- as.matrix(Rohwer1[,3:5])
 > keep <- 2:n
 > (B1 <- solve(t(X[keep,]) %*% X[keep,]) %*% t(X[keep,]) %*% Y[keep,])
           SAT       PPVT      Raven
    -22.899170 41.7757793 13.5057156
n    3.206119  0.0482388  0.0569482
s    3.482679  0.5514477  0.5153053
ns  -6.477172 -0.6051252 -0.1930919
na   5.607684  1.5011562  0.1162274
ss  -0.488731  0.4601508 -0.1148580

OK, ?influence tells me that the 'coefficients' returned are actually 
B-B(-i), and I can see that
it gives those for the first response variable (SAT)

 > B-B1
                    SAT       PPVT       Raven
(Intercept) -5.5683009 -2.0786897 -0.26187994
n            0.0510135  0.0190437  0.00239919
s           -0.4860961 -0.1814634 -0.02286134
ns           0.6181096  0.2307451  0.02907000
na           0.0585384  0.0218528  0.00275309
ss          -0.1339219 -0.0499941 -0.00629841

For SAT, same as re-fitting a univariate model:

 > rohwer.inf1 <- influence(lm(SAT ~ n + s + ns + na + ss, 
data=Rohwer1), do.coef=TRUE)
 > colnames(rohwer.inf1$coefficients) <- paste("SAT", 
colnames(rohwer.inf1$coefficients), sep=":")
 > head(rohwer.inf1$coefficients, 2)
    SAT:(Intercept)     SAT:n     SAT:s   SAT:ns     SAT:na     SAT:ss
38        -5.56830 0.0510135 -0.486096  0.61811  0.0585384 -0.1339219
39         2.30754 0.1092886  0.388349 -0.40717 -0.0600899  0.0477736
 >

But I'm looking for a method I can generalize.  I also tried the 
following,  using subset= in
update() and lm(), giving a different wrong answer. (Am I using the 
subset= argument
correctly?)

 > coef(update(rohwer.mod, subset=1:n !=1, data=Rohwer1))
                    SAT      PPVT      Raven
(Intercept) -28.428163 51.762326 11.0052852
n             0.912497 -0.316177  0.1294643
s             5.478358  0.280319  0.6639180
ns           -6.119679 -1.828516 -0.2995350
na            5.383479  2.096793  0.1940462
ss           -0.345764  0.448199 -0.0725977
Warning message:
In 1:n : numerical expression has 32 elements: only the first used

So, how can I calculate leave-one-out coefficients (and other 
quantities) efficiently
for mlm-s?

-Michael



-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
4700 Keele Street    Web:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA



More information about the R-help mailing list