[R] GLMM - Am I trying the impossible?

Thu Aug 18 12:21:26 CEST 2005

It is not supported to call anova() on a glmmPQL fit.

For the glmmPQL fit you show, you have very large parameter estimates for 
a logistic and have partial separation (as you comment on for the control 
group): in that case PQL is not a reasonable method.

Try

fit <- glm(dead ~ Parasite * Bacteria, data=fish.data, family=binomial)
summary(fit)
anova(fit, test="Chisq")
fitted(fit)

and you have fitted values of zero (up to numerical tolerances).

This *is* discussed in MASS, around pp.198-9.

So there is little point in adding random efects for that model.  Now try

fit2 <- glmmPQL(dead ~ Parasite + Bacteria, random=~1|Tank,
                 family=binomial, data=fish.data)
summary(fit2)

Fixed effects: dead ~ Bacteria + Parasite
                 Value Std.Error  DF   t-value p-value
(Intercept) -4.243838 0.7519194 150 -5.644007  0.0000
Parasiteyes  1.264102 0.4205313   7  3.005964  0.0198
Bacteriayes  2.850640 0.7147180   7  3.988483  0.0053

which is pretty similar to the lmer fit you show.

I don't know what anova is doing for your lmer fit, but I do know that it 
should not be working with sums of squares as is being reported.

On Thu, 18 Aug 2005, Pedro J. Aphalo wrote:

> Dear all,
>
> I have tried to calculate a GLMM fit with lmer (lme4) and glmmPQL
> (MASS), I also used glm for comparison.

I think you missed what glm was trying to tell you.

> I am getting very different results from different functions, and I
> suspect that the problem is with our dataset rather than the functions,
> but I would appreciate help in deciding whether my suspicions are right.
> If indeed we are attempting the wrong type of analysis, some guidance
> about what would be the right thing to do would be greatly appreciated.
>
> The details:
> The data:
> The data are from the end-point of a survival experiment with fish. The
> design of the experiment is a 2 x 2 factorial, with each factor
> (Bacteria and Parasite) at two levels (yes and no). There were 16 fish
> in each tank, and the treatment was applied to the whole tank. There
> were in all 10 tanks (160 fish), with 2 tanks for controls (no/no), 2
> tanks for (Parasite:yes/Bacteria:no) and 3 tanks for each of the other 2
> treatments. A dead fish was considered a success, and a binomial family
> with the default logit link was used in the fits. No fish died in the
> control treatment (Is this the problem?).
>
> Overall "probabilities" as dead/total for the four treatments were:
> Paras Bact  Prob
> no    no    0
> yes   no    0.0625
> no    yes   0.208
> yes   yes   0.458
>
> We are interested in testing main effects and interaction, but the
> interaction is of special interest to us.
>
> The data for "dead" are coded as 0/1 with 1 indicating a dead fish, and
> the file has one row per fish.
>
> Some results:
>
> lme4  (ver 0.98-1, R 2.1.1, Windows XP)
> ~~~~
>
> > fish1.lmerPQL <- lmer(dead ~ Parasite * Bacteria + (1|Tank),
> data=fish.data, family=binomial)
> Error in lmer(dead ~ Parasite * Bacteria + (1 | Tank), data = fish.data,  :
>         Unable to invert singular factor of downdated X'X
> In addition: Warning message:
> Leading minor of size 4 of downdated X'X is indefinite
> >
>
> without the interaction:
> > fish3.lmerPQL <- lmer(dead ~ Parasite + Bacteria + (1|Tank),
> data=fish.data, family=binomial)
> > anova(fish3.lmerPQL)
> Analysis of Variance Table
>          Df  Sum Sq Mean Sq   Denom F value Pr(>F)
> Parasite  1   0.012   0.012 157.000  0.0103 0.9192
> Bacteria  1   0.058   0.058 157.000  0.0524 0.8192
> > summary(fish3.lmerPQL)
> Generalized linear mixed model fit using PQL
> Formula: dead ~ Parasite + Bacteria + (1 | Tank)
>    Data: fish.data
>  Family: binomial(logit link)
>       AIC      BIC    logLik deviance
>  141.3818 156.7577 -65.69091 131.3818
> Random effects:
>      Groups        Name    Variance    Std.Dev.
>        Tank (Intercept)       5e-10  2.2361e-05
> # of obs: 160, groups: Tank, 10
>
> Estimated scale (compare to 1)  0.9318747
>
> Fixed effects:
>             Estimate Std. Error z value  Pr(>|z|)
> (Intercept) -4.24380    0.79924 -5.3098 1.098e-07 ***
> Parasiteyes  1.26407    0.44694  2.8283 0.0046801 **
> Bacteriayes  2.85062    0.75970  3.7523 0.0001752 ***
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> Correlation of Fixed Effects:
>             (Intr) Prstys
> Parasiteyes -0.429
> Bacteriayes -0.898  0.093
>
> Very different P-values from anova and summary.
>
> MASS: (ver 7.2-17, R 2.1.1, Windows XP)
> ~~~~~
>
> > summary(fish1.glmmpql)
> Linear mixed-effects model fit by maximum likelihood
>  Data: fish.data
>        AIC      BIC   logLik
>   1236.652 1255.103 -612.326
>
> Random effects:
>  Formula: ~1 | Tank
>         (Intercept)  Residual
> StdDev:  0.02001341 0.8944214
>
> Variance function:
>  Structure: fixed weights
>  Formula: ~invwt
> Fixed effects: dead ~ Parasite * Bacteria
>                             Value Std.Error  DF     t-value p-value
> (Intercept)             -18.56607  1044.451 150 -0.01777591  0.9858
> Parasiteyes              15.85802  1044.451   6  0.01518311  0.9884
> Bacteriayes              17.23107  1044.451   6  0.01649772  0.9874
> Parasiteyes:Bacteriayes -14.69007  1044.452   6 -0.01406487  0.9892
>  Correlation:
>                         (Intr) Prstys Bctrys
> Parasiteyes             -1
> Bacteriayes             -1      1
> Parasiteyes:Bacteriayes  1     -1     -1
>
> Standardized Within-Group Residuals:
>           Min            Q1           Med            Q3           Max
> -1.028634e+00 -5.734674e-01 -2.886770e-01 -4.224474e-14  4.330155e+00
>
> Number of Observations: 160
> Number of Groups: 10
> > anova(fish1.glmmpql)
>                   numDF denDF   F-value p-value
> (Intercept)           1   150 17.452150  <.0001
> Parasite              1     6  4.136142  0.0882
> Bacteria              1     6 12.740212  0.0118
> Parasite:Bacteria     1     6  0.000198  0.9892
> >
>
> > anova(glmmPQL(dead~Bacteria*Parasite, random=~1|Tank,
> family=binomial, data=fish.data))
> iteration 1
>                   numDF denDF   F-value p-value
> (Intercept)           1   150 17.452150  <.0001
> Bacteria              1     6  8.980833  0.0241
> Parasite              1     6  7.895521  0.0308
> Bacteria:Parasite     1     6  0.000198  0.9892
> >
>
> Now anova indicates significance, but summary gives huge P-values.
>
> I have looked in MASS, ISwR, Fox's and Crawley's book, for hints, but I
> have probably missed the right spots/books. Hints about what to read
> will be also appreciated.
>
> Many thanks in advance, and sorry about the long message.
>
> The report on this research is obviously not yet published, but if the
> dataframe would be of help, it can be found as saved from R at:
> http://people.cc.jyu.fi/~aphalo/R_fish/fish.Rda. (Use load to read it
> into R).
>
> Pedro.
> -- 
> ==================================================================
> Pedro J. Aphalo
> Department of Biological and Environmental Science
> University of Jyväskylä
> P.O. Box 35, 40351 JYVÄSKYLÄ, Finland
> Phone  +358 14 260 2339
> Mobile +358 50 3721504
> Fax    +358 14 260 2321
> mailto:pedro.aphalo at cc.jyu.fi
> http://www.jyu.fi/~aphalo/                       ,,,^..^,,,
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595