[R] Predicted values from glm() when linear predictor is NA.

Thu Jul 28 02:26:28 CEST 2022

I have a data frame with a numeric ("TrtTime") and a categorical
("Lifestage") predictor.

Level "L1" of Lifestage occurs only with a single value of TrtTime,
explicitly 12, whence it is not possible to estimate a TrtTime "slope"
when Lifestage is "L1".

Indeed, when I fitted the model

    fit <- glm(cbind(Dead,Alive) ~ TrtTime*Lifestage, family=binomial,
               data=demoDat)

I got:

> as.matrix(coef(fit))
>                                   [,1]
> (Intercept)                -0.91718302
> TrtTime                     0.88846195
> LifestageEgg + L1         -45.36420974
> LifestageL1                14.27570572
> LifestageL1 + L2           -0.30332697
> LifestageL3                -3.58672631
> TrtTime:LifestageEgg + L1   8.10482459
> TrtTime:LifestageL1                 NA
> TrtTime:LifestageL1 + L2    0.05662651
> TrtTime:LifestageL3         1.66743472

That is, TrtTime:LifestageL1 is NA, as expected.

I would have thought that fitted or predicted values corresponding to
Lifestage = "L1" would thereby be NA, but this is not the case:

> predict(fit)[demoDat$Lifestage=="L1"]
>       26       65      131 
> 24.02007 24.02007 24.02007
>
> fitted(fit)[demoDat$Lifestage=="L1"]
>  26  65 131 
>   1   1   1

That is, the predicted values on the scale of the linear predictor are
large and positive, rather than being NA.

What this amounts to, it seems to me, is saying that if the linear
predictor in a Binomial glm is NA, then "success" is a certainty.
This strikes me as being a dubious proposition.  My gut feeling is that
misleading results could be produced.

Can anyone explain to me a rationale for this behaviour pattern?
Is there some justification for it that I am not currently seeing?
Any other comments?  (Please omit comments to the effect of "You are as
thick as two short planks!". :-) )

I have attached the example data set in a file "demoDat.txt", should
anyone want to experiment with it.  The file was created using dput() so
you should access it (if you wish to do so) via something like

    demoDat <- dget("demoDat.txt")

Thanks for any enlightenment.

cheers,

Rolf Turner

-- 
Honorary Research Fellow
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: demoDat.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20220728/dd771e34/attachment.txt>