[R] Overdispersed poisson - negative observation

David Firth david.firth at nuffield.oxford.ac.uk
Thu Jan 16 22:06:03 CET 2003


Peter

You are right to suppose that negative observations should not be a 
problem for a quasi-Poisson fit (though negative *fitted* values would 
be, in a log-linear model).

The problem is that the "quasipoisson" function is overly restrictive, 
in requiring non-negativity of the y variable.  (commenting out the 
error trap is not enough to solve the problem).  There are two places 
where "quasipoisson" has problems:

     dev.resids <- function(y, mu, wt) 2 * wt * (y * log(ifelse(y ==
         0, 1, y/mu)) - (y - mu))

fails for negative values of y, while

     initialize <- expression({
         if (any(y < 0)) stop(paste("Negative values not allowed for",
             "the quasiPoisson family"))
         n <- rep(1, nobs)
         mustart <- y + 0.1
     })

would (without the error trap) return negative initial values for 
means, which won't work with a log link.

The first problem is the hard one: the deviance isn't defined when y is 
negative.  But an alternative is to use the Pearson X^2 statistic, 
which is defined for all values of y (and makes more sense anyway for 
quasi-likelhood models -- it's what you use to determine the dispersion 
constant).

The second problem is easy to fix: for example, just take as starting 
values the global mean (corresponding to zero-valued effects in the 
model).  I often find that this is a good starting point anyway -- 
better than the default choice, which typically is not in the model 
space (even when there are no negative values of y).

I give below an amended "quasipoisson" which seems to work, ie it fits 
the model by the quasi-likelihood method with variance proportional to 
mean.  It makes only the two changes just described.  A *slightly* 
unsatisfactory aspect is that quantities reported as "deviance" are in 
fact Pearson X^2 statistics.  But no other choice really makes sense 
there.

David

quasipoisson <- function (link = "log")
## Amended by David Firth, 2003.01.16, at points labelled ###
## to cope with negative y values
##
## Computes Pearson X^2 rather than Poisson deviance
##
## Starting values are all equal to the global mean
{
     linktemp <- substitute(link)
     if (!is.character(linktemp)) {
         linktemp <- deparse(linktemp)
         if (linktemp == "link")
             linktemp <- eval(link)
     }
     if (any(linktemp == c("log", "identity", "sqrt")))
         stats <- make.link(linktemp)
     else stop(paste(linktemp, "link not available for poisson",
         "family; available links are", "\"identity\", \"log\" and 
\"sqrt\""))
     variance <- function(mu) mu
     validmu <- function(mu) all(mu > 0)
     dev.resids <- function(y, mu, wt) wt*(y-mu)^2/mu   ###
     aic <- function(y, n, mu, wt, dev) NA
     initialize <- expression({
         n <- rep(1, nobs)
         mustart <- rep(mean(y), length(y))             ###
     })
     structure(list(family = "quasipoisson", link = linktemp,
         linkfun = stats$linkfun, linkinv = stats$linkinv, variance = 
variance,
         dev.resids = dev.resids, aic = aic, mu.eta = stats$mu.eta,
         initialize = initialize, validmu = validmu, valideta = 
stats$valideta),
         class = "family")
}


On Thursday, Jan 16, 2003, at 15:45 Europe/London, 
peter.fledelius at wgo.royalsun.com wrote:

> Dear R users
>
> I have been looking for functions that can deal with overdispersed 
> poisson
> models. Some (one) of the observations are negative. According to 
> actuarial
> literature (England & Verall, Stochastic Claims Reserving in General
> Insurance , Institute of Actiuaries 2002) this can be handled through 
> the
> use of quasi likelihoods instead of normal likelihoods. The presence of
> negatives is not normal in a poisson model, however, we see them 
> frequently
> in this type of data, and we would like to be able to fit the model 
> anyway.
>
> At the moment R is complaining about negative values and the link 
> function
> = log.
>
> My code looks like this. Do any of you know if this problem can be 
> solved
> in R? Any suggestions are welcomed.
>
> Kind regards,
>
> Peter Fledelius (new R user)
>
> *********** Code ************
> paym   <- c(5012, 3257, 2638,  898, 1734, 2642, 1828,  599,   54,  172,
>              106, 4179, 1111, 5270, 3116, 1817, -103,  673,  535,
>             3410, 5582, 4881, 2268, 2594, 3479,  649,  603,
>             5655, 5900, 4211, 5500, 2159, 2658,  984,
>             1092, 8473, 6271, 6333, 3786,  225,
>             1513, 4932, 5257, 1233, 2917,
>              557, 3463, 6956, 1368,
>             1351, 5596, 6165,
>             3133, 2262,
>             2063)
> alpha   <- factor(c(1,1,1,1,1,1,1,1,1,1,
>              2,2,2,2,2,2,2,2,2,
>              3,3,3,3,3,3,3,3,
>              4,4,4,4,4,4,4,
>              5,5,5,5,5,5,
>              6,6,6,6,6,
>              7,7,7,7,
>              8,8,8,
>              9,9,
>              10))
> beta    <- factor(c(1,2,3,4,5,6,7,8,9,10,
>              1,2,3,4,5,6,7,8,9,
>              1,2,3,4,5,6,7,8,
>              1,2,3,4,5,6,7,
>              1,2,3,4,5,6,
>              1,2,3,4,5,
>              1,2,3,4,
>              1,2,3,
>              1,2,
>              1))
> d.AD <- data.frame(paym, alpha, beta)
> glm.qD93 <- glm(paym ~ alpha + beta, family=quasipoisson())
> glm.qD93
> ************ Code end ***************
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>




More information about the R-help mailing list