[R] NB and poisson glm models: three issues

Fri Nov 18 17:46:46 CET 2011

Hi, 

I fit both Poisson and NB (negative binomial) models to some empirical data.

Although models provide me with sensible parameters, in the case of the NB
models i get three inconsistencites: 

- First, the total number of occurrences predicted by the model (i.e.
fitted(fit)) is much greater than those of the data. I realise that poisson
and NB models are different in the sense that expectations for the NB model
do not need to be equal (however the over-estimation of the NB model is too
much i believe)

- Sometimes there exist a datapoint that predicts 1000 times more occurences
than what would be expected. 

- Sometimes the model with offset predicts sensible results but if I take
the offset and use log(variable) I obtain some datapoints that predict many
more occurenes than what would be expected. 

I have tried to create an example of the aforementioned problems. However, i
only achieved to recreate my first problem (normally 20% of increase is
shown). And as it happens, no problem is shown in realtion to my third
problem as the predicted and observed values are equal for this example.  

#-----------------------------------------------------------------------------------

# Response variable with  "lots" zeros (I dont want to use hurdle or ZIP
models...)
response <- rpois(1000, 1) * sample(rep(0:1,1000), size=1000, replace=FALSE)

# Offset, numerical and categorical variables
offset.var <- sample(rep(1:10,1000), size=1000, replace=FALSE)
numerical <- sample(rep(1:1000,1000), size=1000, replace=FALSE)
categorical <- sample(rep(c("A","B","C"),1000), size=1000, replace=FALSE)

# Dataframe
example.data <-data.frame(offset.var,numerical,categorical,response)

# Fit
fit.po <- glm(response ~ numerical + categorical + offset(log(offset.var)),
family="poisson",data = example.data)
fit.nb <- glm.nb(response ~ numerical + categorical +
offset(log(offset.var)), data = example.data)

fit.po.non.off <- glm(response ~ numerical + categorical + log(offset.var),
family="poisson",data = example.data)
fit.nb.non.off  <- glm.nb(response ~ numerical + categorical +
log(offset.var), data = example.data)

# Comparison
sum(response)
sum(fitted(fit.po))
sum(fitted(fit.nb ))
sum(fitted(fit.po.non.off))
sum(fitted(fit.nb.non.off ))

#-----------------------------------------------------------------------------------

Any thoughts??

Many thanks

--
View this message in context: http://r.789695.n4.nabble.com/NB-and-poisson-glm-models-three-issues-tp4083890p4083890.html
Sent from the R help mailing list archive at Nabble.com.