[R] Poisson Maximum Likelihood Estimation

Fri Jan 25 15:21:03 CET 2008

"Paul Sweeting" <mail at paulsweeting.co.uk> wrote in
news:000101c85f4e$c3d25b50$4b7711f0$@co.uk: 

> Hi
> 
> I am trying to carry out some maximum likelihood estimation and I'm
> not making much headway, and I'm hoping that someone will be able to
> point me in the right direction.
> 
> I am modelling mortality statistics.  One way to do this is to model
> the mortality rate (or, more accurately, log of the mortality rate,
> log_m) as (say) a constant plus a proportion of age, plus time, so:
> 
> r_1 <- lm(formula=log_m ~ age + time)
> summary(r_1)
> 
> However, an alternative approach is to use try and estimate the
> number of deaths from the poisson mean mortality rate, and the
> number of people, with the poisson mean being defined in terms of
> age and time (And a constant). Conceptually I can see how this
> should work, in terms of linking the poisson probabilities together
> at each age and optimising the coefficients on age and time to
> maximise the log likelihood, but I cannot work out how to express
> this in R.  Hopefully this outlines what I'm trying to do - any 
> ideas gratefully received. 

The most straightforward approach would be to use glm() with 
family="poisson", a.k.a Poisson regression.  Don't ask me why 
Gamma is capitalized as an R family but poisson is not, I don't know. 
Since the link with family=poisson is log() by default, you model the 
rate (or count) without prior transformation. You can model either the 
rates or the counts. If counts are modeled, you would add 
offset=log(person.years) to the covariate side of the formula to 
adjust for varying intensity of exposure. The help examples inglm() don't 
have much that directly illustrates PR. Here is a link to a nice worked 
example:
<http://www.faculty.sbc.edu/bkirk/Biostatistics/course%20documents%20for%202006/Poisson%20Regression.doc>

Some examples from an epi course at UCB's SPH:
<http://www.medepi.net/selvin/chap11job.txt>

A third approach would be to use count as the dependent variable and 
log(expected) as the offset. This would give you an estimate of the SMR 
for various covariate strata.

-- 
David Winsemius