[R] Likelihood ratio test between glm and glmer fits

Sat Jul 19 23:51:42 CEST 2008

This particular case with a random intercept model can be handled by
glmmML, by bootstrapping the p-value.

Best, Göran

On Thu, Jul 17, 2008 at 1:29 PM, Douglas Bates <bates at stat.wisc.edu> wrote:
> On Thu, Jul 17, 2008 at 2:50 AM, Rune Haubo <rhbc at imm.dtu.dk> wrote:
>> 2008/7/16 Dimitris Rizopoulos <Dimitris.Rizopoulos at med.kuleuven.be>:
>>> well, for computing the p-value you need to use pchisq() and dchisq() (check
>>> ?dchisq for more info). For model fits with a logLik method you can directly
>>> use the following simple function:
>>>
>>> lrt <- function (obj1, obj2) {
>>>    L0 <- logLik(obj1)
>>>    L1 <- logLik(obj2)
>>>    L01 <- as.vector(- 2 * (L0 - L1))
>>>    df <- attr(L1, "df") - attr(L0, "df")
>>>    list(L01 = L01, df = df,
>>>        "p-value" = pchisq(L01, df, lower.tail = FALSE))
>>> }
>>>
>>> library(lme4)
>>> gm0 <- glm(cbind(incidence, size - incidence) ~ period,
>>>              family = binomial, data = cbpp)
>>> gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
>>>              family = binomial, data = cbpp)
>>>
>>> lrt(gm0, gm1)
>>
>> Yes, that seems quite natural, but then try to compare with the deviance:
>>
>> logLik(gm0)
>> logLik(gm1)
>>
>> (d0 <- deviance(gm0))
>> (d1 <- deviance(gm1))
>> (LR <- d0 - d1)
>> pchisq(LR, 1, lower = FALSE)
>>
>> Obviously the deviance in glm is *not* twice the negative
>> log-likelihood as it is in glmer. The question remains which of these
>> two quantities is appropriate for comparison. I am not sure exactly
>> how the deviance and/or log-likelihood are calculated in glmer, but my
>> feeling is that one should trust the deviance rather than the
>> log-likelihoods for these purposes. This is supported by the following
>> comparison: Ad an arbitrary random effect with a close-to-zero
>> variance and note the deviance:
>>
>> tmp <- rep(1:4, each = nrow(cbpp)/4)
>> gm2 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | tmp),
>>             family = binomial, data = cbpp)
>> (d2 <- deviance(gm2))
>>
>> This deviance is very close to that obtained from the glm model.
>>
>> I have included the mixed-models mailing list in the hope that someone
>> could explain how the deviance is computed in glmer and why deviances,
>> but not likelihoods are comparable to glm-fits.
>
> In that example I think the problem may be that I have not yet written
> the code to adjust the deviance of the glmer fit for the null
> deviance.
>
>>> However, there are some issues regarding this likelihood ratio test.
>>>
>>> 1) The null hypothesis is on the boundary of the parameter space, i.e., you
>>> test whether the variance for the random effect is zero. For this case the
>>> assumed chi-squared distribution for the LRT may *not* be totally
>>> appropriate and may produce conservative p-values. There is some theory
>>> regarding this issue, which has shown that the reference distribution for
>>> the LRT in this case is a mixture of a chi-squared(df = 0) and
>>> chi-squared(df = 1). Another option is to use simulation-based approach
>>> where you can approximate the reference distribution of the LRT under the
>>> null using simulation. You may check below for an illustration of this
>>> procedure (not-tested):
>>>
>>> X <- model.matrix(gm0)
>>> coefs <- coef(gm0)
>>> pr <- plogis(c(X %*% coefs))
>>> n <- length(pr)
>>> new.dat <- cbpp
>>> Tobs <- lrt(gm0, gm1)$L01
>>> B <- 200
>>> out.T <- numeric(B)
>>> for (b in 1:B) {
>>>    y <- rbinom(n, cbpp$size, pr)
>>>    new.dat$incidence <- y
>>>    fit0 <- glm(formula(gm0), family = binomial, data = new.dat)
>>>    fit1 <- glmer(formula(gm1), family = binomial, data = new.dat)
>>>    out.T[b] <- lrt(fit0, fit1)$L01
>>> }
>>> # estimate p-value
>>> (sum(out.T >= Tobs) + 1) / (B + 1)
>>>
>>>
>>> 2) For the glmer fit you have to note that you work with an approximation to
>>> the log-likelihood (obtained using numerical integration) and not the actual
>>> log-likelihood.
>>>
>>> I hope it helps.
>>>
>>> Best,
>>> Dimitris
>>>
>>> ----
>>> Dimitris Rizopoulos
>>> Biostatistical Centre
>>> School of Public Health
>>> Catholic University of Leuven
>>>
>>> Address: Kapucijnenvoer 35, Leuven, Belgium
>>> Tel: +32/(0)16/336899
>>> Fax: +32/(0)16/337015
>>> Web: http://med.kuleuven.be/biostat/
>>>     http://www.student.kuleuven.be/~m0390867/dimitris.htm
>>>
>>>
>>> Quoting COREY SPARKS <corey.sparks at UTSA.EDU>:
>>>
>>>> Dear list,
>>>> I am fitting a logistic multi-level regression model and need to  test the
>>>> difference between the ordinary logistic regression from a  glm() fit and
>>>> the mixed effects fit from glmer(), basically I want  to do a likelihood
>>>> ratio test between the two fits.
>>>>
>>>>
>>>> The data are like this:
>>>> My outcome is a (1,0) for health status, I have several (1,0) dummy
>>>>  variables RURAL, SMOKE, DRINK, EMPLOYED, highereduc, INDIG, male,
>>>>  divorced, SINGLE, chronic, vigor_d and moderat_d and AGE is  continuous (20
>>>> to 100).
>>>> My higher level is called munid and has 581 levels.
>>>> The data have 45243 observations.
>>>>
>>>> Here are my program statements:
>>>>
>>>> #GLM fit
>>>>
>>>> ph.fit.2<-glm(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d,family=binomial(),
>>>>  data=mx.merge)
>>>> #GLMER fit
>>>>
>>>> ph.fit.3<-glmer(poorhealth~RURAL+SMOKE+DRINK+EMPLOYED+INSURANCE+highereduc+INDIG+AGE+male+divorced+SINGLE+chronic+vigor_d+moderat_d+(1|munid),family=binomial(),
>>>>  data=mx.merge)
>>>>
>>>> I cannot find a method in R that will do the LR test between a glm  and a
>>>> glmer fit, so I try to do it using the liklihoods from both  models
>>>>
>>>> #form the likelihood ratio test between the glm and glmer fits
>>>> x2<--2*(logLik(ph.fit.2)-logLik(ph.fit.3))
>>>>
>>>>>    ML
>>>>
>>>> 79.60454
>>>> attr(,"nobs")
>>>>    n
>>>> 45243
>>>> attr(,"nall")
>>>>    n
>>>> 45243
>>>> attr(,"df")
>>>> [1] 14
>>>> attr(,"REML")
>>>> [1] FALSE
>>>> attr(,"class")
>>>> [1] "logLik"
>>>>
>>>> #Get the associated p-value
>>>> dchisq(x2,14)
>>>>         ML
>>>>>
>>>>> 5.94849e-15
>>>>
>>>> Which looks like an improvement in model fit to me.  Am I seeing  this
>>>> correctly or are the two models even able to be compared? they  are both
>>>> estimated via maximum likelihood, so they should be, I think.
>>>> Any help would be appreciated.
>>>>
>>>> Corey
>>>>
>>>> Corey S. Sparks, Ph.D.
>>>>
>>>> Assistant Professor
>>>> Department of Demography and Organization Studies
>>>> University of Texas San Antonio
>>>> One UTSA Circle
>>>> San Antonio, TX 78249
>>>> email:corey.sparks at utsa.edu
>>>> web: https://rowdyspace.utsa.edu/users/ozd504/www/index.htm
>>>>
>>>>
>>>>        [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>
>>>
>>>
>>> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> --
>> Rune Haubo Bojesen Christensen
>>
>> Master Student, M.Sc. Eng.
>> Phone: (+45) 30 26 45 54
>> Mail: rhbc at imm.dtu.dk, rune.haubo at gmail.com
>>
>> DTU Informatics, Section for Statistics
>> Technical University of Denmark, Build.321, DK-2800 Kgs. Lyngby, Denmark
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Göran Broström