[R] Time series count model?

pauljohn@ukans.edu pauljohn at ukans.edu
Tue Nov 20 15:36:41 CET 2001


I fear I need someone to throw a brick at my head to shake loose
the cobwebs. And it might as well be the r friends as anybody!

I am trying to counsel a student who has count data (with many
0's and small nuumbers) that is a time series.  He was fitting
linear models to this data, but the count nature of the data
causes me a lot of concern, and I am looking around for a time
series approach to a model which allows all the bells and
whistles of count models. By chance, I notice that Congdon's
WinBUGS examples have an example that has a count model with
time series issues, but I fear it might be too great of a leap
for us to justify a paradigm switch to Bayesian statistics in
order to fit this one model. 

If you were using R, how would you get a foot hold on this
problem?

I want a NegativeBinomial count model (terminology maybe
ambiguous: that means a Poisson with input (mean) value derived
from a Gamma(Xb,1) process) and the possiblity of zero-inflated
data, which means the likelihood of an observation is multiplied
by 1 or 0 according to a draw from a logistic distribution. (I
realize this is starting to sound junked up, but Scott Long's
book on Regression with Qualitative Depenedent Variables writes
out all the details.)  

Because the theory that inspires this model is a dynamic
process, it has a lagged dependent variable on the right hand
side. So when I say X here, I mean a matrix in which lagged y's
are included as columns. So the individual observation's
likelihood is (I believe this is the Negative Binomial model
with zero inflagion factor)

input = draw from Gamma(exp(Xb),1))
zif = draw from logistic(Xb)

if (zif == 1) then:
p(y |X,b) = Pois( input )

if (zif == 0) then: 
y=0

I suppose it is not necessarily true the zero inflation logit
part depends on the exact same coefficients as the Gamma part,
but lets worry about that later....

We have a copy of Stata sitting around here and the students
have found in there a canned procedure to estimate that model,
it does various specificiation tests, such as a test for whether
the zero inflation part is necessary.  But that does not attend
to the time-series part. I don't do Stata myself, I am learning
R, and would like to see if I can do this in R.

We need to incorporate the possibility of an "error term" that
is influenced by its own lagged values in the usual ARIMA
sense.  Looking back to the justification for the Negative
Binomial in the first place, I remember one justification for
the NB model was a Poisson with heterogeneity.

p(y | X,b) = Pois (exp(Xb + e))

I don't think I ever understood very well why this leads to a NB
model. Maybe that's where I need to study. 

Nevertheless, where can I go if I start with that theory, but
the e are not independent, say they are MA(1)

e_t = g*e_{t-1} + u_t

and u_t is Normal(0,sigma^2).

Should I just write out a big log likelihood function and use
R's optim to fit it?

It seems like I'm missing out on something by going that route,
though.

-- 
Paul E. Johnson                       email: pauljohn at ukans.edu
Dept. of Political Science           
http://lark.cc.ukans.edu/~pauljohn
University of Kansas                  Office: (785) 864-9086
Lawrence, Kansas 66045                FAX: (785) 864-5700
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list