[R] negative binomial regression, unbalanced panel

hl hl.statlist at googlemail.com
Wed Nov 24 17:44:07 CET 2010


I am a student who is doing empirical work for his thesis and trying to 
switch to R. I am familiar with Stata, and at the moment I am trying to 
replicate some of my previous work.

I have a large unbalanced panel data set, observations for different 
countries between 1970 and 2007. My dependent variable is an 
overdispersed count. So far I have used fixed-effects negative binomial 
regression, i.e. assuming constant within-group dispersion. The command 
in Stata is xtnbreg, fe.

How could I replicate this in R?

I have found the package pglm, and tried the following

pglm(T_total ~ Lgdpqt_2 + Lgdpqt_3 + Lgdpqt_4 + lpop + yrsconflict +
         past_T_total + Lpolcat_2 + Lpolcat_3 + Lpolcat_4 + Lgdpgr + 
mob_fixed +
         wdi_urbanpop + Lopen + Ldurable + factor(year), data = df, family =
         negbin, model = "within",  index = c("code","year")))

This takes ages, and then returns the following

Maximum Likelihood estimation
Newton-Raphson maximisation, 3 iterations
Return code 3: Last step could not find a value above the current.
Boundary of parameter space?
Consider switching to a more robust optimisation method temporarily.
Log-Likelihood: 112720.7
46  free parameters
Estimates:
                     Estimate  Std. error t value   Pr(> t)
(Intercept)      -177.015528   70.277178 -2.5188   0.01177 *
Lgdpqt_2          -34.386693          NA      NA        NA
Lgdpqt_3          -26.709422          NA      NA        NA
Lgdpqt_4          -53.875809          NA      NA        NA
lpop               34.821642          NA      NA        NA
yrsconflict        -8.693849          NA      NA        NA
past_T_total       -9.558045          NA      NA        NA
Lpolcat_2         -11.601625          NA      NA        NA
Lpolcat_3           2.397754    0.374797  6.3975 1.580e-10 ***
Lpolcat_4         -11.661048          NA      NA        NA
..........
and several warnings.

  If I drop the year dummies (is factor(year) more appropriate than a 
list of variables?), the results are the same as in Stata, but it is 
still taking quite long and the warnings persist. I think the problem 
lies somehow with figuring out the estimation sample. Stata 
automatically drops groups with all zero outcomes and with only one obs 
per group, as well as those year dummies that are unnecessary (I do the 
same regression for different dependent variables). The documentation 
for pglm mentions that there might be problems with unbalanced panels.

How could I go about doing this? Did I make a mistake using pglm, or is 
it simply unsuited for my task? I think this could possibly be 
formulated as a mixed model. I looked into nlme, which afaik doesnt 
support the negative binomial family. Hope this is a relevant issue, I 
could find more anything else on the web/ this list, and a similar 
question on stackoverflow was left without a suitable answer.

On a side note, I'm used to using underscores in variable names, but 
have read that this is not good pratice in R and that dots should be 
used instead. whats the reason behind that?


Thanks very much for your help,

hl



More information about the R-help mailing list