[R] Is there dpois equivalent for zero-inflated Poisson?

Martin Maechler maechler at stat.math.ethz.ch
Wed Mar 23 16:27:33 CET 2016


>>>>> Thierry Onkelinx <thierry.onkelinx at inbo.be>
>>>>>     on Tue, 22 Mar 2016 13:58:09 +0100 writes:

    > dpois(0, lambda) == e^(-lambda)
    > The wikipedia formula is
    > ifelse(x == 0, zero + dpois(0, lambda) * (1-zero), dpois(x, lambda) *
    > (1-zero))

    > or

    > ifelse(x == 0, zero + dpois(x, lambda) * (1-zero), dpois(x, lambda) *
    > (1-zero))

    > so we can move the dpois() out of the ifelse()

    > ifelse(x == 0, zero, 0)  + dpois(x, lambda) * (1-zero)

Nice!  Thank you, Thierry.

Even nicer for symmetry reasons (and much faster) is *not* to use ifelse(), 
so we'd get

dzipois <- function(x, lambda, zero)
  (x == 0) * zero +  dpois(x, lambda) * (1-zero)

However, numerically correctly adding the  'log = FALSE'
argument which all good "density" functions have in R --
((and which you *should* use as  log = TRUE  for the
  log-likelihood if you want to become more professional))
is a bit tricky.

The x == 0 case at least needs care for large lambda and/or
small 'zero' (= pi).
All this is related on how to accurately compute  f(L) = log(1 - exp(- L))
which I call   log1mexp(L) in my still-not-submitted paper
available as one of the Rmpfr vignettes at
https://cloud.r-project.org/web/packages/Rmpfr/vignettes/log1mexp-note.pdf

BTW:  There are at least 3-4 R packages which deal with zero-inflated
      poisson in some ways, notably the recommended  'mgcv'
      package which comes bundled with every R.


Martin Maechler,
ETH ZUrich

    > ir. Thierry Onkelinx
    > Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
    > Forest
    > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    > Kliniekstraat 25
    > 1070 Anderlecht
    > Belgium

    > To call in the statistician after the experiment is done may be no more
    > than asking him to perform a post-mortem examination: he may be able to say
    > what the experiment died of. ~ Sir Ronald Aylmer Fisher
    > The plural of anecdote is not data. ~ Roger Brinner
    > The combination of some data and an aching desire for an answer does not
    > ensure that a reasonable answer can be extracted from a given body of data.
    > ~ John Tukey

    > 2016-03-22 13:50 GMT+01:00 Matti Viljamaa <mviljamaa at kapsi.fi>:

    >> And why is the first term of ifelse(x == 0, zero, 0) + dpois(x, lambda) /
    >> (1 - zero)
    >> 
    >> ifelse(x == 0, zero, 0)
    >> 
    >> rather than something corresponding to
    >> 
    >> zero+(1-zero)e^{-lambda}
    >> 
    >> https://en.wikipedia.org/wiki/Zero-inflated_model#Zero-inflated_Poisson
    >> 
    >> On 22 Mar 2016, at 14:25, Matti Viljamaa <mviljamaa at kapsi.fi> wrote:
    >> 
    >> Could you clarify what are the parameters and why it’s formulated that way?
    >> 
    >> -Matti
    >> 
    >> On 22 Mar 2016, at 14:17, Thierry Onkelinx <thierry.onkelinx at inbo.be>
    >> wrote:
    >> 
    >> Dear Matti,
    >> 
    >> What about this?
    >> 
    >> dzeroinflpois <- function(x, lambda, zero){
    >> ifelse(x == 0, zero, 0) + dpois(x, lambda) / (1 - zero)
    >> }
    >> plot(x, dzeroinflpois(x, lambda = 10, zero = 0.2), type = "l")
    >> 
    >> 
    >> 
    >> ir. Thierry Onkelinx
    >> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
    >> Forest
    >> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    >> Kliniekstraat 25
    >> 1070 Anderlecht
    >> Belgium
    >> 
    >> To call in the statistician after the experiment is done may be no more
    >> than asking him to perform a post-mortem examination: he may be able to say
    >> what the experiment died of. ~ Sir Ronald Aylmer Fisher
    >> The plural of anecdote is not data. ~ Roger Brinner
    >> The combination of some data and an aching desire for an answer does not
    >> ensure that a reasonable answer can be extracted from a given body of data.
    >> ~ John Tukey
    >> 
    >> 2016-03-22 13:04 GMT+01:00 Matti Viljamaa <mviljamaa at kapsi.fi>:
    >> 
    >>> I’m doing some optimisation that I first did with normal Poisson (only
    >>> parameter theta was estimated), but now I’m doing the same with a
    >>> zero-inflated Poisson model which
    >>> gives me two estimated parameters theta and p (p is also pi in some
    >>> notation).
    >>> 
    >>> My question is, is there something equivalent to dpois that would use
    >>> both of the parameters (or is the p parameter possibly unnecessary)?
    >>> 
    >>> I’m calculating the “fit” of the Poisson model
    >>> 
    >>> i.e. like
    >>> 
    >>> x = c(0,1,2,3,4,5,6)
    >>> y = c(3062,587,284,103,33,4,2)
    >>> fit1 <- sum(y)*dpois(x, est_theta)
    >>> 
    >>> and then comparing fit1 to the real observations.
    >>> [[alternative HTML version deleted]]
    >>> 
    >>> ______________________________________________
    >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    >>> https://stat.ethz.ch/mailman/listinfo/r-help
    >>> PLEASE do read the posting guide
    >>> http://www.R-project.org/posting-guide.html
    >>> <http://www.r-project.org/posting-guide.html>
    >>> and provide commented, minimal, self-contained, reproducible code.
    >> 
    >> 
    >> 
    >> 

    > [[alternative HTML version deleted]]

    > ______________________________________________
    > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    > https://stat.ethz.ch/mailman/listinfo/r-help
    > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
    > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list