# [R] specification for glmmPQL

Andrew R. Criswell r-stats at arcriswell.com
Wed Sep 14 17:36:31 CEST 2005

```  Dear Prof. Bates and Group:

I hope it is not to late to revisit this thread. My concern is with the
difference in standard errors estimated from data that is arranged as
grouped (data.1) and ungrouped (data.2). With the grouped data set, the
effect of treatment is highly significant; with the data ungrouped, is
is only marginally significant. My empirical findings depend on the
choice of how to construct the data frame. Which is correct?

Best wishes,
Andrew

> summary(fm.5 <- lmer(cbind(response, 100 - response) ~ expt +
+                      (1 | subject), data = data.1, family = binomial,
+                      method = "AGQ"))

Generalized linear mixed model fit using AGQ
Formula: cbind(response, 100 - response) ~ expt + (1 | subject)
Data: data.1
AIC      BIC    logLik deviance
2437.298 2443.161 -1214.649 2429.298
Random effects:
Groups        Name    Variance    Std.Dev.
subject (Intercept)    0.026600     0.16309
# of obs: 32, groups: subject, 8

Estimated scale (compare to 1)  8.669802

Fixed effects:
Estimate Std. Error z value  Pr(>|z|)
(Intercept)   0.3082489  0.0081604  37.774 < 2.2e-16 ***
expttreatment 0.2160440  0.0115933  18.635 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr)
expttretmnt -0.704
>
> summary(fm.6 <- lmer(response ~ expt + (1 | subject), data = data.2,
+                      family = binomial, method = "AGQ"))
Generalized linear mixed model fit using AGQ
Formula: response ~ expt + (1 | subject)
Data: data.2
AIC      BIC    logLik deviance
4298.023 4322.306 -2145.011 4290.023
Random effects:
Groups        Name    Variance    Std.Dev.
subject (Intercept)    0.015878     0.12601
# of obs: 3200, groups: subject, 8

Estimated scale (compare to 1)  1.007666

Fixed effects:
Estimate Std. Error z value  Pr(>|z|)
(Intercept)    0.30813    0.08075  3.8159 0.0001357 ***
expttreatment  0.21350    0.11473  1.8609 0.0627583 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
(Intr)
expttretmnt -0.704

Douglas Bates wrote:

>On 9/4/05, Andrew R. Criswell <r-stats at arcriswell.com> wrote:
>
>
>>Hello Dr. Bates and group,
>>
>>I understand, the attached data file did not accompany my original
>>message. I have listed below the code used to create that file.
>>
>>data.1 <- data.frame(subject  = factor(rep(c("one", "two", "three", "four",
>>                                             "five", "six", "seven",
>>"eight"),
>>                                           each = 4),
>>                                       levels = c("one", "two", "three",
>>                                                  "four", "five", "six",
>>                                                  "seven", "eight")),
>>                     day      = factor(rep(c("one", "two", "three", "four"),
>>                                           times = 8),
>>                                       levels = c("one", "two", "three",
>>                                                  "four")),
>>                     expt     = rep(c("control", "treatment"), each = 16),
>>                     response = c(58, 63, 57, 54, 63, 59, 61, 53, 52, 62,
>>                                  46, 55, 59, 63, 58, 59, 62, 59, 64, 53,
>>                                  63, 75, 62, 64, 53, 58, 62, 53, 64, 72,
>>                                  65, 74))
>>
>>mtrx.1 <- matrix(apply(data.1[, -4], 2, function(x)
>>                 rep(x, 100 - data.1\$response)), ncol = 3, byrow = F)
>>mtrx.2 <- matrix(apply(data.1[, -4], 2, function(x)
>>                 rep(x, data.1\$response)), ncol = 3, byrow = F)
>>
>>data.2 <- data.frame(subject  = factor(c(mtrx.1[,1], mtrx.2[,1]),
>>                                       levels = c("one", "two", "three",
>>                                                  "four", "five", "six",
>>                                                  "seven", "eight")),
>>                     day      = factor(c(mtrx.1[,2], mtrx.2[,2]),
>>                                       levels = c("one", "two", "three",
>>                                                  "four")),
>>                     expt     = factor(c(mtrx.1[,3], mtrx.2[,3]),
>>                                       levels = c("control", "treatment")),
>>                     response = factor(c(rep("yes", nrow(mtrx.1)),
>>                                         rep("no", nrow(mtrx.2))),
>>                                       levels = c("yes", "no")))
>>
>>#-------------------------------------------------------------------------------#
>>
>>
>
>Thanks for sending the data.
>
>In your first message you said that you got completely different
>results from glmmPQL when fitting the two models.  When I fit these
>models with glmmPQL I got quite similar parameter estimates.  The
>reported log-likelihood or AIC or BIC values are quite different but
>these values apply to a different model (the list weighted linear
>mixed model used in the PQL algorithm) and should not be used for a
>glmm model in any case.
>
>The fm4 results from lmer in the lme4 package (actually lmer is now in
>the Matrix package but that is only temporary) confirm those from
>glmmPQL.  The model fm3 when fit by lmer provides different standard
>errors but that is because the weights are not being appropriately
>adjusted in lmer.  We will fix that.
>
>In general I think it is safest to use the long form of the data as in
>
>Here are the results from lmer applied to the long form.  The results
>to those from the PQL method because AGQ is a more accurate
>approximation to the log-likelihood of the GLMM model.  In this case
>the differences are minor.
>
>The log-likelihood reported here is an approximation to the
>log-likelihood of the GLMM model.
>
>
>
>>(fm.4 <- lmer(response ~ expt + (1|subject), data.2, binomial))
>>
>>
>Generalized linear mixed model fit using PQL
>Formula: response ~ expt + (1 | subject)
>   Data: data.2
>      AIC     BIC    logLik deviance
> 4298.026 4322.31 -2145.013 4290.026
>Random effects:
>     Groups        Name    Variance    Std.Dev.
>    subject (Intercept)    0.015835     0.12584
># of obs: 3200, groups: subject, 8
>
>Estimated scale (compare to 1)  0.9990621
>
>Fixed effects:
>              Estimate Std. Error z value  Pr(>|z|)
>(Intercept)    0.30764    0.08075  3.8098 0.0001391
>expttreatment  0.21319    0.11473  1.8582 0.0631454
>
>
>>(fm.4a <- lmer(response ~ expt + (1|subject), data.2, binomial, method = "AGQ"))
>>
>>
>Generalized linear mixed model fit using AGQ
>Formula: response ~ expt + (1 | subject)
>   Data: data.2
>      AIC      BIC    logLik deviance
> 4298.023 4322.306 -2145.011 4290.023
>Random effects:
>     Groups        Name    Variance    Std.Dev.
>    subject (Intercept)    0.015855     0.12592
># of obs: 3200, groups: subject, 8
>
>Estimated scale (compare to 1)  1.007675
>
>Fixed effects:
>              Estimate Std. Error z value  Pr(>|z|)
>(Intercept)    0.30811    0.08075  3.8156 0.0001358
>expttreatment  0.21352    0.11473  1.8611 0.0627322
>
>
>
>
>>Douglas Bates wrote:
>>
>>
>>
>>>On 9/4/05, Andrew R. Criswell <r-stats at arcriswell.com> wrote:
>>>
>>>
>>>
>>>>Hello All,
>>>>
>>>>I have a question regarding how glmmPQL should be specified. Which of
>>>>these two is correct?
>>>>
>>>>summary(fm.3 <- glmmPQL(cbind(response, 100 - response) ~ expt,
>>>>                       data = data.1, random = ~ 1 | subject,
>>>>                       family = binomial))
>>>>
>>>>summary(fm.4 <- glmmPQL(response ~ expt, data = data.2,
>>>>                       random = ~ 1 | subject, family = binomial))
>>>>
>>>>One might think it makes no difference, but it does.
>>>>
>>>>I have an experiment in which 8 individuals were subjected to two types
>>>>of treatment, 100 times per day for 4 consecutive days. The response
>>>>given is binary--yes or no--for each treatment.
>>>>
>>>>I constructed two types of data sets. On Rfile-01.Rdata (attached here)
>>>>are data frames, data.1 and data.2. The information is identical but the
>>>>data are arranged differently between these two data frames. Data frame,
>>>>data.1, groups frequencies by subject, day and treatment. Data frame,
>>>>data.2, is ungrouped.
>>>>
>>>>
>>>>
>>>I don't think your attached .Rdata file made it through the various
>>>filters on the lists or on my receiving email.  Could you send me a
>>>copy in a separate email message?
>>>
>>>
>>>
>>>
>>>>Consistency of these data frames is substantiated by computing these
>>>>tables:
>>>>
>>>>ftable(xtabs(response ~ expt + subject + day,
>>>>            data = data.1))
>>>>ftable(xtabs(as.numeric(response) - 1 ~ expt + subject + day,
>>>>            data = data.2))
>>>>
>>>>If I ignore the repeated measurement aspect of the data, I get, using
>>>>glm, identical results (but for deviance and df).
>>>>
>>>>summary(fm.1 <- glm(cbind(response, 100 - response) ~ expt,
>>>>                   data = data.1, family = binomial))
>>>>
>>>>summary(fm.2 <- glm(response ~ expt, data = data.2,
>>>>                   family = binomial))
>>>>
>>>>However, if I estimate these two equations as a mixed model using
>>>>glmPQL, I get completely different results between the two
>>>>specifications, fm.3 and fm.4. Which one is right? The example which
>>>>accompanies help(glmmPQL) suggests fm.4.
>>>>
>>>>summary(fm.3 <- glmmPQL(cbind(response, 100 - response) ~ expt,
>>>>                       data = data.1, random = ~ 1 | subject,
>>>>                       family = binomial))
>>>>
>>>>summary(fm.4 <- glmmPQL(response ~ expt, data = data.2,
>>>>                       random = ~ 1 | subject, family = binomial))
>>>>
>>>>Thank you,
>>>>Andrew
>>>>
>>>>
>>>>
>>>>
>>>>______________________________________________
>>>>R-help at stat.math.ethz.ch mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
>
>
>

```