[R] merge coefficients from a glmlist of models

Tue Oct 28 16:47:05 CET 2014

In the vcdExtra package, I have a function glmlist to collect a set of 
glm() models as a "glmlist" object,
and other functions that generate & fit such a collection of models.

This is my working example, fitting a set of models to the Donner data

# install.packages("vcdExtra", repos="http://R-Forge.R-project.org")# 
needs devel version
data("Donner", package="vcdExtra")
# make survived a factor
Donner$survived <- factor(Donner$survived, labels=c("no", "yes"))
Donner$family.size <- ave(as.numeric(Donner$family), Donner$family, 
FUN=length)
# collapse small families into "Other"
fam <- Donner$family
levels(fam)[c(3,4,6,7,9)] <- "Other"
# reorder, putting Other last
fam = factor(fam,levels(fam)[c(1, 2, 4:6, 3)])
Donner$family <- fam

# fit models
donner.mod0 <- glm(survived ~ age, data=Donner, family=binomial)
donner.mod1 <- glm(survived ~ age + sex, data=Donner, family=binomial)
donner.mod2 <- glm(survived ~ age * sex , data=Donner, family=binomial)
donner.mod3 <- glm(survived ~ poly(age,2) + sex, data=Donner, 
family=binomial)
donner.mod4 <- glm(survived ~ poly(age,2) * sex, data=Donner, 
family=binomial)
mods <- glmlist(donner.mod1, donner.mod2, donner.mod3, donner.mod4)

I'd like to write other methods for handling a glmlist, similar to the 
way stats::anova.glmlist works, e.g.,

 > library(vcdExtra)
 > mods <- glmlist(donner.mod1, donner.mod2, donner.mod3, donner.mod4)
 >
 > anova(mods, test="Chisq")
Analysis of Deviance Table

Model 1: survived ~ age + sex
Model 2: survived ~ age * sex
Model 3: survived ~ poly(age, 2) + sex
Model 4: survived ~ poly(age, 2) * sex
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 87 111.128
2 86 110.727 1 0.4003 0.52692
3 86 106.731 0 3.9958
4 84 97.799 2 8.9321 0.01149 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# gives same result as anova() with an explicit list of models:

 > anova(donner.mod1, donner.mod2, donner.mod3, donner.mod4, test="Chisq")
Analysis of Deviance Table

Model 1: survived ~ age + sex
Model 2: survived ~ age * sex
Model 3: survived ~ poly(age, 2) + sex
Model 4: survived ~ poly(age, 2) * sex
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 87 111.128
2 86 110.727 1 0.4003 0.52692
3 86 106.731 0 3.9958
4 84 97.799 2 8.9321 0.01149 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Then, using the function vcdExtra::Summarise, I can define a 
Summarise.glmlist method that is essentially

sumry <- lapply(mods, Summarise)
do.call(rbind, sumry)

 > Summarise(mods)# not yet added to the package
Likelihood summary table:
AIC BIC LR Chisq Df Pr(>Chisq)
donner.mod1 117.13 124.63 111.128 87 0.04159 *
donner.mod2 118.73 128.73 110.727 86 0.03755 *
donner.mod3 114.73 124.73 106.731 86 0.06439 .
donner.mod4 109.80 124.80 97.799 84 0.14408
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Similarly, I can define a residuals.glmlist method, using using cbind() 
to collect the residuals from all
models.
But I'm stumped on a coef() method, because the coefficients fitted in 
various models
differ.

 > coefs <- lapply(mods, "coef")
 > coefs
$donner.mod1
(Intercept) age sexMale
1.59915455 -0.03379836 -1.20678665

$donner.mod2
(Intercept) age sexMale age:sexMale
1.85514867 -0.04565225 -1.62177307 0.01957257

$donner.mod3
(Intercept) poly(age, 2)1 poly(age, 2)2 sexMale
0.8792031 -7.9366059 -6.6929413 -1.3745016

$donner.mod4
(Intercept) poly(age, 2)1 poly(age, 2)2 sexMale
0.7621901 -26.9688970 -30.5626032 -1.0995718
poly(age, 2)1:sexMale poly(age, 2)2:sexMale
22.7210591 28.8975876

The result I want is a data.frame with columns corresponding to the 
models, and rows corresponding
to the unique coefficient names, with NA filled in where a term is missing.

-- 
Michael Friendly     Email: friendly AT yorku DOT ca
Professor, Psychology Dept. & Chair, Quantitative Methods
York University      Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street    Web:http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA