[R] number of matches when using Match()

Sun Apr 23 00:30:08 CEST 2006

> How do you go about deciding how many matches you will use?  With my
> data, my standard errors generally get smaller if I use more
> matches.

Generally, select the max number of matches that result in good or
acceptable balance (hence bounding bias due to the observed
confounders).  See the MatchBalance() function to get some measures of
balance.  And GenMatch() for automatically maximizing (observed)
covariate balance.

How to measure good balance is an open research question.  I will note
that the degree of covariate balance that is usually thought to be
acceptable in the applied literature isn't enough to get reliable
estimates in practice.  We can evaluate this by comparing an
observational estimate (with matching adjustment) with a known
experimental benchmark.  See:

http://sekhon.berkeley.edu/papers/GenMatch.pdf

> Speaking of standard errors, when correcting for heteroscedasticity,
> how many matches do you use (this is the Var.cal option).  It seems to
> me that it might make sense to use the same number of matches as
> above, but that's just a guess...

These are related but separate issues.  The number of matches is all
about covariate balance (bias reduction).  And the Var.cal option is
related to the heterogeneity of the causal effect.  It could be that
the data is such that one needs to do 1-to-1 matching to get good
covariate balance, but that the causal effect is homogeneous so
Var.cal can be set to 0 etc.

> One more question about Match()...
> I am calculating a number of SATT's that all have the same covariates
> (X's) and treatment variables (Tr's).  I would like to take advantage
> of the matching that I do the first time to then quickly calculate the
> SATT for various different Y's?  How can I do that?  It would save
> serious computational time.

The following code expands on your code and will estimate the mean
causal effect and the naive standard errors without a second call to
Match().  Doing this for the Abadie-Imbens SEs instead of the naive
SEs is left as an exercise (take the code from the Matching.R file of
the package).  In a future version of the package, I'll make a
separate function to make all of this transparent by using the
"predict()" setup.

###################
library(Matching)

set.seed(30)
#make up some data
X <- matrix(rnorm(1000*5), ncol=5)
Tr <- c(rep(1,500),rep(0,500))
Y1 <- as.vector(rnorm(1000))
Y2 <- as.vector(rnorm(1000))

satt.Y1 <- Match(Y=Y1, X=X, Tr=Tr, M=1)
summary(satt.Y1, full=TRUE)

cat("****** Estimate Y2 BY Calling Match() \n")
satt.Y2 <- Match(Y=Y2, X=X, Tr=Tr, M=1)
summary(satt.Y2, full=TRUE)

cat("****** Estimate Without Calling Match() \n")
index.treated <- satt.Y1$index.treated
index.control <- satt.Y1$index.control
weights <- satt.Y1$weights
Y <- Y2

mest  <- sum((Y[index.treated]-Y[index.control])*weights)/sum(weights)
cat("estimate for Y2:", mest, "\n")

v1  <- Y[index.treated] - Y[index.control]
varest  <- sum( ((v1-mest)^2)*weights)/(sum(weights)*sum(weights))
se.naive  <- sqrt(varest)
cat("naive SE Y2:", se.naive, "\n")

###############

Cheers,
JS.

=======================================
Jasjeet S. Sekhon                     

Associate Professor             
Survey Research Center          
UC Berkeley                     

http://sekhon.berkeley.edu/
V: 510-642-9974  F: 617-507-5524
=======================================

Brian Quinif writes:
 > To anyone who uses the Match() function in the Matching library...
 > 
 > How do you go about deciding how many matches you will use?  With my
 > data, my standard errors generally get smaller if I use more matches.
 > 
 > Speaking of standard errors, when correcting for heteroscedasticity,
 > how many matches do you use (this is the Var.cal option).  It seems to
 > me that it might make sense to use the same number of matches as
 > above, but that's just a guess...
 > 
 > One more question about Match()...
 > I am calculating a number of SATT's that all have the same covariates
 > (X's) and treatment variables (Tr's).  I would like to take advantage
 > of the matching that I do the first time to then quickly calculate the
 > SATT for various different Y's?  How can I do that?  It would save
 > serious computational time.
 > 
 > In case I'm not explaining myself well, in the example below, I would
 > like to calculate satt.Y2 without having to perform the matching all
 > over again, since with more data, the process can be very slow.
 > 
 > #make up some data
 > X <- matrix(rnorm(1000*5), ncol=5)
 > Tr <- c(rep(1,500),rep(0,500))
 > Y1 <- as.vector(rnorm(1000))
 > Y2 <- as.vector(rnorm(1000))
 > 
 > satt.Y1 <- Match(Y=Y1, X=X1, Tr=Tr, M=1)
 > satt.Y2 <- Match(Y=Y2, X=X1, Tr=Tr, M=1)
 > 
 > Thanks,
 > 
 > BQ
 > 
 >