[R] Replicating Stata's xtreg clustered SEs in R

Mon Mar 12 14:50:30 CET 2012

Hello David.
Usually I'd ask for a reproducible example (see the posting guide), but
as I routinely check my results against Stata, this time I think I know
what happens already. 

There are two issues here: one is cluster-robust covariance estimation,
which in Stata is done through 'vce(cluster <groupvar>)' (or
'vce(robust)') and you correctly performed in R/plm by specifying
'method="arellano"' in vcovHC; the second is small sample correction,
which is where R/plm and Stata differ slightly.

Stata:
Stata's help is opaque in this respect, or I am not smart enough to read
it. Cameron, Gelbach and Miller (2006), "Robust inference with multi-way
clustering" (http://www.nber.org/papers/t0327) on page 8 state it
weights the residuals by sqrt(G/(G-1)*(N-1)/(N-K)) where G is the number
of clusters and N the number of obs. in each.

plm:
In 'plm' the standard HC0-HC4 small sample corrections from the
non-clustering literature are used instead: see a survey in Zeileis
(2004) "Econometric Computing with HC and HAC Covariance Matrix
Estimators" here http://www.jstatsoft.org/v11/i10/

so that re: your question, 
1) there is currently in R/plm no _exact_ counterpart to the
small-sample correction used in Stata's xtreg under the option
'vce(cluster <groupvar>)'.

2) HC1 is actually the closest cousin, weighing obs. by a function of
sample size (while in HC0 weights are all =1 and from HC2 on they are
based on the hat matrix), yet this function is a different one:
GN/(GN-K) instead of G(N-1)/((G-1)/(N-K))

I have recently been working on a more general version of vcovHC
including this stuff, should be out "soon". In the meantime, if I get
around to do it, I'll send you a quick hack of vcovHC doing the
correction the Stata way: but I can't promise :^)

Best,
Giovanni

Giovanni Millo, PhD
Research Dept.,
Assicurazioni Generali SpA
Via Machiavelli 4,
34132 Trieste (Italy)
tel. +39 040 671184
fax  +39 040 671160

######################## original message #############################
Message: 49
Date: Mon, 12 Mar 2012 00:55:46 -0400
From: "David A. Kim" <david_kim at hms.harvard.edu>
To: r-help at r-project.org
Subject: [R] Replicating Stata's xtreg clustered SEs in R
Message-ID:

<CAC=kL6L34JC3bPq=PrAEpUTZJkeO2yWVaVEvLJkb_Vka7V6FSA at mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

I'm trying to replicate a time-series cross-sectional analysis
(countries over years) with SEs clustered by country. ?The original
analysis was done in Stata 10 with: xtreg [DV] [IVs] fe
cluster(country).

Using plm() in R (cran.r-project.org/web/packages/plm/index.html),
I've replicated the coefficients. I sought to estimate
country-clustered SEs with vcovHC(), and tried a variety of options,
but couldn't exactly replicate the published (i.e., Stata 10's) SEs.
In R, vcovHC(x, method="arellano", type="HC1", cluster="group") came
closest to Stata's SEs (differing at the 3rd decimal place or so).

Does anyone happen to know what method cluster() for Stata's xtreg
uses to calculate clustered SEs for panel data, and/or how this could
be implemented equivalently in R? ?Any help would be much appreciated.

Many thanks in advance,

David

######################## end original message
#############################

Ai sensi del D.Lgs. 196/2003 si precisa che le informazi...{{dropped:12}}