[R] Question regarding lmer with binary response

Douglas Bates dmbates at gmail.com
Sun Sep 4 20:34:53 CEST 2005

On 9/4/05, Bernd Weiss <bernd.weiss at uni-koeln.de> wrote:
> Dear all, dear Prof. Bates,
> my dependent variable (school absenteeism, truancy[1]) is a binary
> response for which I am trying to compute an unconditional mixed
> effects model. I've got observations (monday, wednesday and friday)
> nested in individuals (ID2), which were nested in classes (KID2) and
> schools (SID), i.e. a 4-level mixed effects model.
> In short, I was trying without success. I got no sensible results
> using lmer as well as using glmmPQL. I played around with the control
> parameters and the methods (PQl, Laplace) in lmer without any effect.
> I would really appreciate if someone could have a look into my data
> and tell me what's going wrong here.
> My R script and data can be found at:
> http://www.metaanalyse.de/tmp/rhelp.R
> http://www.metaanalyse.de/tmp/rhelp.txt
> TIA,
> Bernd

Thanks for making the data and your script available.  That helps a
lot when investigating cases like these.

As you say, you have 3 binary responses per student and that is just
not enough information to fit a model like a generalized linear mixed
model.  Most of the students had 3 positive responses and 0 negative. 
In fact, out of the 6708 students, only 444 missed any days at all. 
Only 186 out of the 302 classes had any missing data.  It is just not
possible to fit a four level mixed effects model to such sparse data.

Consider only the pattern within students.  I did some very messy
manipulations to look at the unique patterns of absent:present
observations with the results shown below.  (Challenge to the reader:
Can you come up with relatively clean method of calculating  the
number of students with each of the patterns of absent:present shown

  A:P Freq  Pct
  0:1  413    0
  0:2  161    0
  0:3 5690    0
  1:2  258   33
  1:1   10   50
  2:1   65   67
  1:0   19  100
  2:0   10  100
  3:0   82  100

The important point to understand is that students who are present at
all observations or who are absent at all observations contribute very
little information to such a model.  The model fitting ends up giving
them a very large positive or negative random effect and they
contribute no other information.  The most information comes from the
students who are present some of the time and absent some of the time
and those are 333 students out of 6708.
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

More information about the R-help mailing list