[R] Account for a factor variability in a logistic GLMM in lme4

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Sep 4 11:28:12 CEST 2018


>>>>> Jim Lemon 
>>>>>     on Tue, 4 Sep 2018 08:36:22 +1000 writes:

    > Hi Pedro,
    > I have encountered similar situations in a number of areas. Great care
    > is taken to record significant events of low probability, but not the
    > non-occurrence of those events. Sometimes this is due to a problem
    > with the definition of non-occurrence. To use your example, how close
    > does an animal have to approach the crossing to be counted as not
    > crossing? Perhaps it was just a failure to record the species of
    > animals that didn't cross. In that case you have a problem, because
    > the probability of crossing within species cannot be estimated from
    > the data you describe.

    > Jim

Indeed!

For those among us too young to remember:

The 1986 Space shuttle Challenger catastrophe was co-caused by
that mistake:  Only considering the '1's and not considering the
'0's in the data (visualised and shown to the decision making experts).

See, e.g.,
  https://priceonomics.com/the-space-shuttle-challenger-explosion-and-the-o/

  (couldn't easily find a more academic / reliable source which
   *does* include the graphics)

Martin Maechler
ETH Zurich

    > On Tue, Sep 4, 2018 at 12:43 AM Pedro Vaz <zasvaz using gmail.com> wrote:
    >> 
    >> We did a field study in which we tried to understand which factors
    >> significantly explain the probability of a group of animals (5 species in
    >> total) crossing through 30 wildlife road-crossing structures. The response
    >> variable is binomial (yes=crossed; no = did not cross) and was recorded by
    >> animal species. We did about 30 visits to each crossing structure (our
    >> random factor) in which we recorded the binomial response by each animal
    >> species and the values of a few predictors.
    >> 
    >> So, I have this (simplified for better understanding) mixed effects model:
    >> library (lme4)
    >> 
    >> Mymodel <- glmer(cross.01 ~ stream.01 + width.m + grass.per + (1|structure.id),
    >> data = Mydata, family = binomial)
    >> 
    >> stream is a factor with 2 levels; width.m is continuous; grass.per is a
    >> percentage
    >> 
    >> This is the model in which I assessed crossings by all species combined
    >> (i.e., cross. 01 = 1 when an animal of any species crossed, cross.01 = 0
    >> when no animal crossed). However, we did one model per species and those
    >> species-specific models highlight that different species exhibit different
    >> relationships between crossings and explanatory variables.
    >> 
    >> My problem: This means that my model above suffers from an additional
    >> source of variation related to the species level without accounting for it.
    >> However I cannot recalibrate the above model adding the species level as
    >> random factor because, in my binomial response, the zero means no species
    >> crossed (all zeros would have "NA" or, say, "none" for species) and so that
    >> additional source of variation is only present when the response was 1.
    >> Just to confirm this, I did add species as a random factor:
    >> 
    >> (1 | structure.id) + (1 | species)
    >> 
    >> As expected, the message is "Error: Response is constant"
    >> 
    >> How can I account for the species variability in my model in lme4?
    >> 
    >> A few more details:
    >> A few more details:
    >> - I had 5 mammal species crossing through the 30 road-crossing structures.
    >> In 134 occasions (i.e., 134 of my records on individual
    >> crossing-structures), no animal crossed (so, @Dimitris Rizopoulos, no, I
    >> didn't have the species of the animals which did not cross. A "no cross"
    >> was a "zero" for that visit to the crossing-structure). In 498 occasions,
    >> at least one animal of a given species crossed the structure (these were my
    >> "ones" in my logistic response)
    >> - A side comment: This is to respond to a reviewer in a paper of mine,
    >> i.e., I did and presented species-specific and "all combined species"
    >> models in the draft reviewed but now the reviewer is asking me to control
    >> for the species variability in the "combined species model". He asked me to
    >> include a random factor but I realized that is not possible since all my
    >> zeros would have "none" for the species that crossed. So, is it possible to
    >> control for the species variability in my model in lme4 in another way? I
    >> know in nlme including a fitting of variance structures it's not that
    >> difficult...
    >> - Every time an animal crossed, the binary response was "one" and I
    >> recorded the animal species as well. Thus, I have variability between
    >> species in the "ones" but not in my "zeros" of my logistic model.
    >> 
    >> [[alternative HTML version deleted]]
    >> 
    >> ______________________________________________
    >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >> https://stat.ethz.ch/mailman/listinfo/r-help
    >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    >> and provide commented, minimal, self-contained, reproducible code.

    > ______________________________________________
    > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list