[R] GLM Starting Values

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jul 23 02:51:45 CEST 2010


On Thu, Jul 22, 2010 at 4:56 PM, Tyler Williamson <tswillia at ucalgary.ca> wrote:
> Hello,
>
> Suppose one is interested in fitting a GLM with a log link to binomial data.  How does R choose starting values for the estimation procedure?  Assuming I don't supply them.
>

Assuming weights are not specified it uses this if there is a one
column response:

	mustart <- (y + 0.5) / 2

and this if there is a two column response:

	n <- y[, 1] + y[, 2]
	mustart <- (n * y + 0.5) / (n + 1)

etastart is the link function evaluated at mustart.  For example,
given this data:

set.seed(123)
f1 <- factor(sample(c("a", "b"), 100, replace = TRUE))
f2 <- factor(sample(c("x", "y"), 100, replace = TRUE))
y <- sample(c(0, 1), 100, replace = TRUE)

# compare these two:

# default mustart and etastart
fm <- glm(y ~ f1 + f2, family = "binomial", control = list(trace = TRUE))

# specify mustart and etastart to equal defaults
mustart <- (y + 0.5) / 2
fm <- glm(y ~ f1 + f2, family = "binomial",
	mustart = mustart, etastart = qlogis(mustart),
	control = list(trace = TRUE))



More information about the R-help mailing list