[R] exclusion rules for propensity score matchng (pattern rec)

Tue Apr 5 16:55:55 CEST 2005

hi,

thanks for the reply to my query about exclusion rules for propensity
score matching.

> Exclusion can be based on the non-overlap regions from the propensity.
> It should not be done in the individual covariate space.

i want a rule inspired by non-overlap in propensity score space, but that
binds in the space of the Xs.  because i don't really know how to
interpret the fact that i've excluded, say, people with scores > .87,
but i DO know what it means to say that i've excluded people from
country XYZ over age Q because i can't find good matches for them. if i
make my rule based on Xs, i know who i can and cannot make inference for,
and i can explain to other people who are the units that i can and cannot
make inference for.

after posting to the list last night, i thought of using the RGENOUD
package (genetic algorithm) to search over the space of exclusion rules
(eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function
associated with a rule should be increasing in # of tr units w/out support
excluded and decreasing in # of tr units w/ support excluded.

it might be tricky to get the right loss function, and i know this idea is
kind of nutty, but it's the only automated search method i could think of.

any comments?

alexis

> I tend to look
> at the 10th smallest and largest values of propensity for each of the
> two treatment groups for making the decision.  You will need to exclude
> non-overlap regions whether you use matching or covariate adjustment of
> propensity but covariate adjustment (using e.g. regression splines in
> the logit of propensity) is often a better approach once you've been
> careful about non-overlap.
>
> Frank Harrell

On Tue, 5 Apr 2005, Frank E Harrell Jr wrote:

> adiamond at fas.harvard.edu wrote:
> > Dear R-list,
> >
> > i have 6 different sets of samples.  Each sample has about 5000 observations,
> > with each observation comprised of 150 baseline covariates (X), 125 of which
> > are dichotomous. Roughly 20% of the observations in each sample are "treatment"
> > and the rest are "control" units.
> >
> > i am doing propensity score matching, i have already estimated propensity
> > scores(predicted probabilities) using logistic regression, and in each sample i
> > am going to have to exclude approximately 100 treated observations for which I
> > cannot find matching control observations (because the scores for these treated
> > units are outside the support of the scores for control units).
> >
> > in each sample, i must identify an exclusion rule that is interpretable on the
> > scale of the X's that excludes these unmatchable treated observations and
> > excludes as FEW of the remaining treated observations as possible.
> > (the reason is that i want to be able to explain, in terms of the Xs, who the
> > individuals are that I making causal inference about.)
> >
> > i've tried some simple stuff over the past few days and nothing's worked.
> > is there an R-package or algorithm, or even estimation strategy that anyone
> > could recommend?
> > (i am really hoping so!)
> >
> > thank you,
> >
> > alexis diamond
> >
>
>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University
>