[R] simulate() and glm fits

William Valdar valdar at well.ox.ac.uk
Thu Aug 2 16:59:29 CEST 2007


Dear All,

I have been trying to simulate data from a fitted glm using the simulate() 
function (version details at the bottom). This works for lm() fits and 
even for lmer() fits (in lme4). However, for glm() fits its output does 
not make sense to me -- am I missing something or is this a bug?

Consider the following count data, modelled as gaussian, poisson and 
binomial responses:

counts <- data.frame(
         count  = c(8, 3, 0, 3, 0, 9, 0, 11, 4, 7, 0, 0, 0, 4, 3, 6, 3,
                 15, 11, 9),
         batch = c(3.1, 3.1, 3.1, 3.3, 3.3, 3.3, 3.2, 3.12, 3.8, 3.11,
                 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6),
         weight = c(324.4, 372.5, 352.7, 379.6, 388.1, 431, 448.4, 377.3,
                 376.5, 358.4, 356, 351.4, 350.8, 332.1, 334.5, 392, 370.5,
                 409.7, 375, 318.5))

fit.gaus <- lm(sqrt(count) ~ weight, data=counts)
simulate(fit.gaus)[[1]]
# [1]  2.3280287 -0.7097561  2.5370403  2.3935569 -0.7918554  2.8043848
# [7]  1.7306636  4.2524854  0.3390859  1.4725342  1.1209236  0.1805066
# [13]  3.6348710  1.9740871  2.0315499  2.9240702  3.9282456  1.2174952
# [19]  0.1513106  0.6562989

fit.poisson <- glm(count ~ weight,  family=poisson, data=counts)
simulate(fit.poisson)[[1]]
# [1]  2.11172412  5.68833210  4.32514109  4.93455140  8.14232584 5.05559585
# [7]  3.15849145  7.10833484 -0.24817021  4.32475164 -0.08093695 5.06006941
# [13] 7.38880304  6.47238760  9.53732036  2.59060450  5.10484831 3.21278865
# [19] -0.25775316  4.40702454

fit.binom <- glm(0!=count ~ weight,  family=binomial, data=counts)
simulate(fit.binom)[[1]]
#  [1]  0.47568732  1.26957295  0.38295877 -0.61108990 -1.03040311 0.73164463
#  [7] -0.08117270  0.61211072  0.05559046  0.12979599 -0.29521387 1.76792413
# [13]  0.52791230 -1.10505180 -1.61753057  0.89083075  0.70100867 0.00962979
# [19] -0.43537313  0.38288809

For simulate() to have a consistent definition, shouldn't Poisson and 
binomial produce only positive integer responses? Below is a hacked 
together function that does what I would expect (note that my binomial 
works for only binary response glms):

simulate.glm.hack <- function(fit, nsim=1, seed=NULL)
{
     if (!is.null(seed)) set.seed(seed)
     n <- length(fitted(fit))
     theta <- model.matrix(fit$terms, data=fit$data) %*% coef(fit)

     ymat <- matrix(NA, nrow=n, ncol=nsim)
     for (s in 1:nsim)
     {
         if ("poisson"==fit$family$family)
         {
             ymat[,s] <- rpois(n, fit$family$linkinv(theta))
         }
         else if ("binomial"==fit$family$family)
         {
             ymat[,s] <- rbinom(n, 1, fit$family$linkinv(theta))
         }
     }
     as.data.frame(ymat)
}

# Eg:

simulate.glm.hack(fit.poisson)[[1]]
#  [1] 1 6 4 7 5 9 7 3 3 4 2 4 3 7 4 4 7 2 4 6
simulate.glm.hack(fit.binom)[[1]]
#  [1] 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1

If I am missing something please direct me to the relevant documentation.

Cheers,

Will Valdar

   my version details...

> sessionInfo()
R version 2.5.1 (2007-06-27)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United 
Kingdom.1252;LC_MONETARY=English_United 
Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252

attached base packages:
[1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
[7] "base"

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr William Valdar               ++44 (0)1865 287 589
Wellcome Trust Centre           valdar at well.ox.ac.uk
for Human Genetics, Oxford      www.well.ox.ac.uk/~valdar



More information about the R-help mailing list