[Rd] predict.glm returns different results for the same model

Joris Meys jorismeys at gmail.com
Fri Apr 27 16:23:01 CEST 2018


Hi Hadley,

This is related to how the terms are constructed. If you look at terms(m1)
versus terms(m2), you see that in the case of m1 the knots are added to the
attribute predvars. Contrary, when using splines::ns() this doesn't happen.
Compare:

mf <- model.frame(claim ~ ns(wind, df = 5), data = dat)
mt <-  terms(mf)

with

mf2 <- model.frame(claim ~ splines::ns(wind, df = 5), data = dat)
mt2 <-  terms(mf)

The culprit is actually in makepredictcall.ns, specifically:

    if (as.character(call)[1L] != "ns")
        return(call)

Obviously the call starting with predict::ns() will cause the function to
return the call unaltered. In the case of ns() it processes the knot
information. And that difference causes the difference in predictions.

So it is a bug imho. Especially since I don't understand the logic. The
variable has class 'ns', so makepredictcall() dispatches correctly. That
extra check in the function makepredictcall.ns seems unnecessary to me and
might be simply removed.

Cheers
Joris

On Fri, Apr 27, 2018 at 3:25 PM, Hadley Wickham <h.wickham at gmail.com> wrote:

> Hi all,
>
> Very surprising (to me!) and mystifying result from predict.glm(): the
> predictions vary depending on whether or not I use ns() or
> splines::ns(). Reprex follows:
>
> library(splines)
>
> set.seed(12345)
> dat <- data.frame(claim = rbinom(1000, 1, 0.5))
> mns <- c(3.4, 3.6)
> sds <- c(0.24, 0.35)
> dat$wind <- exp(rnorm(nrow(dat), mean = mns[dat$claim + 1], sd =
> sds[dat$claim + 1]))
> dat <- dat[order(dat$wind), ]
>
> m1 <- glm(claim ~ ns(wind, df = 5), data = dat, family = binomial)
> m2 <- glm(claim ~ splines::ns(wind, df = 5), data = dat, family = binomial)
>
> # The model coefficients are the same
> unname(coef(m1))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
> unname(coef(m2))
> #> [1]  0.5194712 -0.8687737 -0.6803954  4.0838947  2.3908674  4.1564128
>
> # But the predictions are not!
> newdf <- data.frame(wind = seq(min(dat$wind), max(dat$wind), length = 5))
> unname(predict(m1, newdata = newdf))
> #> [1] 0.51947119 0.03208719 2.82548847 3.90883496 4.06743266
> unname(predict(m2, newdata = newdf))
> #> [1]  0.5194712 -0.5666554 -0.1731268  2.8134844  3.9295814
>
> Is this a bug?
>
> (Motivating issue from this ggplot2 bug report:
> https://github.com/tidyverse/ggplot2/issues/2426)
>
> Thanks!
>
> Hadley
>
>
>
> --
> http://hadley.nz
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Department of Data Analysis and Mathematical Modelling
Ghent University
Coupure Links 653, B-9000 Gent (Belgium)
<https://maps.google.com/?q=Coupure+links+653,%C2%A0B-9000+Gent,%C2%A0Belgium&entry=gmail&source=g>

-----------
Biowiskundedagen 2017-2018
http://www.biowiskundedagen.ugent.be/

-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

	[[alternative HTML version deleted]]



More information about the R-devel mailing list