[R] simulate() and glm fits

William Valdar valdar at well.ox.ac.uk
Thu Aug 2 20:19:07 CEST 2007


This is true. It does not state simulate exists for anything other than 
lm. Nonetheless, taken together, the help page items Description 
("Simulate one or more response vectors from the theoretical distribution 
corresponding to a fitted model object"), Details ("This is a generic 
function with a method for lm objects. Consult the individual modeling 
functions for details on how to use this function") and See Also ("glm, lm 
for model fitting") lead the unwary user down a hopeful path.

If I write a prettier version of simulate.glm, I'll post it here.

On Thu, 2 Aug 2007, Prof Brian Ripley wrote:
> I only see an "lm" method for simulate, so I think you are getting that by 
> inheritance.  It is inappropriate and you do need to add a "glm" method.
>
> As ?simulate says fairly clearly that there is only an "lm" method in base R, 
> I don't think this is a bug, just user optimism.
>
>
> On Thu, 2 Aug 2007, William Valdar wrote:
>
>> Dear All,
>> 
>> I have been trying to simulate data from a fitted glm using the simulate()
>> function (version details at the bottom). This works for lm() fits and
>> even for lmer() fits (in lme4). However, for glm() fits its output does
>> not make sense to me -- am I missing something or is this a bug?
>> 
>> Consider the following count data, modelled as gaussian, poisson and
>> binomial responses:
>> 
>> counts <- data.frame(
>>         count  = c(8, 3, 0, 3, 0, 9, 0, 11, 4, 7, 0, 0, 0, 4, 3, 6, 3,
>>                 15, 11, 9),
>>         batch = c(3.1, 3.1, 3.1, 3.3, 3.3, 3.3, 3.2, 3.12, 3.8, 3.11,
>>                 3.4, 3.4, 3.4, 3.4, 3.5, 3.5, 3.5, 3.5, 3.6, 3.6),
>>         weight = c(324.4, 372.5, 352.7, 379.6, 388.1, 431, 448.4, 377.3,
>>                 376.5, 358.4, 356, 351.4, 350.8, 332.1, 334.5, 392, 370.5,
>>                 409.7, 375, 318.5))
>> 
>> fit.gaus <- lm(sqrt(count) ~ weight, data=counts)
>> simulate(fit.gaus)[[1]]
>> # [1]  2.3280287 -0.7097561  2.5370403  2.3935569 -0.7918554  2.8043848
>> # [7]  1.7306636  4.2524854  0.3390859  1.4725342  1.1209236  0.1805066
>> # [13]  3.6348710  1.9740871  2.0315499  2.9240702  3.9282456  1.2174952
>> # [19]  0.1513106  0.6562989
>> 
>> fit.poisson <- glm(count ~ weight,  family=poisson, data=counts)
>> simulate(fit.poisson)[[1]]
>> # [1]  2.11172412  5.68833210  4.32514109  4.93455140  8.14232584 
>> 5.05559585
>> # [7]  3.15849145  7.10833484 -0.24817021  4.32475164 -0.08093695 
>> 5.06006941
>> # [13] 7.38880304  6.47238760  9.53732036  2.59060450  5.10484831 
>> 3.21278865
>> # [19] -0.25775316  4.40702454
>> 
>> fit.binom <- glm(0!=count ~ weight,  family=binomial, data=counts)
>> simulate(fit.binom)[[1]]
>> #  [1]  0.47568732  1.26957295  0.38295877 -0.61108990 -1.03040311 
>> 0.73164463
>> #  [7] -0.08117270  0.61211072  0.05559046  0.12979599 -0.29521387 
>> 1.76792413
>> # [13]  0.52791230 -1.10505180 -1.61753057  0.89083075  0.70100867 
>> 0.00962979
>> # [19] -0.43537313  0.38288809
>> 
>> For simulate() to have a consistent definition, shouldn't Poisson and
>> binomial produce only positive integer responses? Below is a hacked
>> together function that does what I would expect (note that my binomial
>> works for only binary response glms):
>> 
>> simulate.glm.hack <- function(fit, nsim=1, seed=NULL)
>> {
>>     if (!is.null(seed)) set.seed(seed)
>>     n <- length(fitted(fit))
>>     theta <- model.matrix(fit$terms, data=fit$data) %*% coef(fit)
>>
>>     ymat <- matrix(NA, nrow=n, ncol=nsim)
>>     for (s in 1:nsim)
>>     {
>>         if ("poisson"==fit$family$family)
>>         {
>>             ymat[,s] <- rpois(n, fit$family$linkinv(theta))
>>         }
>>         else if ("binomial"==fit$family$family)
>>         {
>>             ymat[,s] <- rbinom(n, 1, fit$family$linkinv(theta))
>>         }
>>     }
>>     as.data.frame(ymat)
>> }
>> 
>> # Eg:
>> 
>> simulate.glm.hack(fit.poisson)[[1]]
>> #  [1] 1 6 4 7 5 9 7 3 3 4 2 4 3 7 4 4 7 2 4 6
>> simulate.glm.hack(fit.binom)[[1]]
>> #  [1] 1 0 0 1 1 0 0 1 1 0 1 1 0 1 0 1 1 1 1 1
>> 
>> If I am missing something please direct me to the relevant documentation.
>> 
>> Cheers,
>> 
>> Will Valdar
>>
>>   my version details...
>> 
>>> sessionInfo()
>> R version 2.5.1 (2007-06-27)
>> i386-pc-mingw32
>> 
>> locale:
>> LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United
>> Kingdom.1252;LC_MONETARY=English_United
>> Kingdom.1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252
>> 
>> attached base packages:
>> [1] "stats"     "graphics"  "grDevices" "utils"     "datasets"  "methods"
>> [7] "base"
>> 
>> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>> Dr William Valdar               ++44 (0)1865 287 589
>> Wellcome Trust Centre           valdar at well.ox.ac.uk
>> for Human Genetics, Oxford      www.well.ox.ac.uk/~valdar
>> 
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Dr William Valdar               ++44 (0)1865 287 589
Wellcome Trust Centre           valdar at well.ox.ac.uk
for Human Genetics, Oxford      www.well.ox.ac.uk/~valdar



More information about the R-help mailing list