[R] pgmm (Blundell-Bond) sample needed)

Mon Mar 30 18:35:58 CEST 2009

Dear Ivo, dear list,

(see: Message: 70
Date: Thu, 26 Mar 2009 21:39:19 +0000
From: ivowel at gmail.com
Subject: [R] pgmm (Blundell-Bond) sample needed)

I think I finally figured out how to replicate your supersimple GMM
example with pgmm() so as to get the very same results as Stata.
Having no other regressors in the formula initially drove me crazy. This was a case where simpler models are trickier than more
complicated ones!

For the benefit of other GMM people on this list, here's a brief résumé
of our rather long private mail exchange of these days, answering to
some other pgmm()-related posts which have appeared on this list
lately. Sorry for the overlong posting but it might be worth the space.

I will refer to the very good Stata tutorial by David Roodman that Ivo
himself pointed me to, which gives a nice
(and free) theoretical intro as well. Please (the others) find it
here: http://repec.org/nasug2006/howtodoxtabond2.cgdev.pdf. As far as
textbooks are concerned, Arellano's
panel data book (Oxford) is the theoretical reference I would
suggest. 

There have been two separate issues: 
- syntax (how to get the right model)
- small sample behaviour (minimal time dimension to get estimates)

I'll start with this last one, then provide a quick "Rosetta stone" of
pgmm() and Stata commands producing the same results. The established
benchmarks for dynamic panels' GMM are the DPD routines written by Arellano et
al. for Gauss and later Ox, but  Stata is proven to give the same
results, and it is the established general reference for panel
data. Lastly I will add the usual examples found in the literature,
although they are very close relatives of 'example(pgmm)', so as to
show the correspondence between the models.

1) Small samples and N-asymptotics:
GMM needs big N, small T. Else you end up having more instruments than
observations and you get a "singular matrix" error (which, as Ivo
correctly found out, happens in the computation of the optimal
weights' matrix). While this is
probably going to be substituted with a more descriptive error
message, it still explains you the heart of the matter. 
Yet Stata
gives you estimates in this case as well: as I suspected, it is
because it uses a generalized inverse (see Roodman's tutorial,
2.6). This looks theoretically ok. Whether this is meaningful in
applied practice is an issue I will discuss with the package
maintainer. IMHO it is not, apart maybe for illustrative purposes, and
it might well encourage bad habits (see the discussion about (not)
fitting the Grunfeld model by GMM on this list, some weeks ago).

2) fitting the simple models
Simplest possible model: AR(1) with individual effects
  x(i,t)= a*(x(i,t-1)) + bi + c

This is what Ivo asked for in the first place. As the usual example is on data from the Arellano and Bond paper,
available in package 'plm' as

> data(EmplUK)

I'll use log(emp) from this dataset as 'x', for ease of reproducibility. Same data are
available in Stata by 'use
"http://www.stata-press.com/data/r7/abdata.dta"'. The Stata dataset is
identical but for the variable names and the fact that in Stata you
have to generate logs beforehand (ugh!). I'm also adding the
'nomata' option to avoid complications, but this will be unnecessary on most
systems (not on mine...).

The system-GMM estimator (with robust SEs) in Stata is 'xtabond2 n
nL1, gmm(L.(n)) nomata robust' whose R equivalent is:

> sysmod<-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, gmm.inst=~log(emp), lag.gmm=c(2,99),  
 + effect="individual", model="onestep", transformation="ld" )
> summary(sysmod, robust=TRUE)

(note that although 'summary(sysmod)' does not report a constant, it's
actually there; this is an issue to be checked).

while the difference-GMM is 'xtabond2 n nL1, gmm(L.(n)) noleveleq
nomata robust', in R:

> diffmod<-pgmm( dynformula( log(emp) ~ 1, list(1)), data=EmplUK, gmm.inst=~log(emp), lag.gmm=c(2,99),  
+  effect="individual", model="onestep", transformation="d" )
> summary(diffmod,robust=TRUE)

The particular model Ivo asked for, using only lags 2-4 as
instruments, is 'xtabond2 x lx, gmm(L.(x),lag(1 3)) robust' in Stata
and only requires to set 'lag.gmm=c(2,4)' in the 'sysmod' above
(notice the difference in the lags specification!).

Note also that, unlike Ivo, I am using robust covariances.

3) fitting the standard examples from the literature.

'example(pgmm)' is a somewhat simplified version of the standard
Arellano-Bond example. For better comparability, here I am replicating
the results from the abest.do Stata script from
http://ideas.repec.org/c/boc/bocode/s435901.html (i.e., the results of
the Arellano and Bond paper done via xtabond2). The same output is also to
be found in Roodman's tutorial, 3.3. 

Here's how to replicate the output of abest.do:
(must execute the preceding lines in the file as well for data transf.)

* Replicate difference GMM runs in Arellano and Bond 1991, Table 4
* Column (a1)
xtabond2 n L(0/1).(l.n w) l(0/2).(k ys) yr198?c cons, gmm(L.n) iv(L(0/1).w l(0/2).(k ys) yr198?c cons) noleveleq noconstant robust nomata

replicated by:
> abmod1<-pgmm(dynformula(log(emp)~log(wage)+log(capital)+log(output),list(2,1,2,2)),
+ data=EmplUK, effect="twoways", model="onestep", 
+ gmm.inst=~log(emp), lag.gmm=list(c(2,99)), transformation="d")
> summary(abmod1,robust=TRUE)

* Column (a2)
xtabond2 n L(0/1).(l.n w) l(0/2).(k ys) yr198?c cons, gmm(L.n)
iv(L(0/1).w l(0/2).(k ys) yr198?c cons) noleveleq noconstant two
nomata

replicated by:
> mymod2<-pgmm(dynformula(log(emp)~log(wage)+log(capital)+log(output),list(2,1,2,2)),
+ data=EmplUK, effect="twoways", model="twosteps", 
+ gmm.inst=~log(emp), lag.gmm=list(c(2,99)), transformation="d")
> summary(mymod2) 

* Column (b)
xtabond2 n L(0/1).(l.n ys w) k yr198?c cons, gmm(L.n) iv(L(0/1).(ys w) k yr198?c cons) noleveleq noconstant two nomata

replicated by:
> abmod3<-pgmm(dynformula(log(emp)~log(wage)+log(capital)+log(output),list(2,1,0,1)),
+ data=EmplUK, effect="twoways", model="twosteps", 
+ gmm.inst=~log(emp), lag.gmm=list(c(2,99)), transformation="d")
> summary(abmod3)

The system versions from bbest.do (ibid.) can be estimated setting
'transformation="ld"' along the lines in 2)

Yves has done a great job, but any GMM estimator is bound to have a
lot of switches. I hope this helps, together with Roodman's paper, in putting instruments and lags
into place. Sure it helped me to better figure out what pgmm() does. 

Thanks to Ivo for motivating me and for ultimately finding most of the answers by himself :^)

Giovanni

Giovanni Millo
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4, 
34132 Trieste (Italy)
tel. +39 040 671184 
fax  +39 040 671160

Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:13}}