[R] exclusion rules for propensity score matchng (pattern rec)

Frank E Harrell Jr f.harrell at vanderbilt.edu
Tue Apr 5 21:42:41 CEST 2005


Alexis J. Diamond wrote:
> hi,
> 
> thanks for the reply to my query about exclusion rules for propensity
> score matching.
> 
> 
>>Exclusion can be based on the non-overlap regions from the propensity.
>>It should not be done in the individual covariate space.
> 
> 
> i want a rule inspired by non-overlap in propensity score space, but that
> binds in the space of the Xs.  because i don't really know how to
> interpret the fact that i've excluded, say, people with scores > .87,
> but i DO know what it means to say that i've excluded people from
> country XYZ over age Q because i can't find good matches for them. if i
> make my rule based on Xs, i know who i can and cannot make inference for,
> and i can explain to other people who are the units that i can and cannot
> make inference for.
> 
> after posting to the list last night, i thought of using the RGENOUD
> package (genetic algorithm) to search over the space of exclusion rules
> (eg., var 1 = 1, var 2 = 0 var 3 = 1 or 0, var 4 = 0); the loss function
> associated with a rule should be increasing in # of tr units w/out support
> excluded and decreasing in # of tr units w/ support excluded.
> 
> it might be tricky to get the right loss function, and i know this idea is
> kind of nutty, but it's the only automated search method i could think of.
> 
> any comments?
> 
> alexis

Use the X space directly will not result in optimum exclusions unless 
you use a distance function but that will make assumptions.  My advice 
is to use rpart to make a classification rule that approximates the 
exclusion criteria to some desired degree of accuracy.  I.e. use rpart 
to predict propensity < lower cutoff and separately to predict 
propensity > upper cutoff.  This just assists in interpretation.

Frank

> 
> 
> 
>>I tend to look
>>at the 10th smallest and largest values of propensity for each of the
>>two treatment groups for making the decision.  You will need to exclude
>>non-overlap regions whether you use matching or covariate adjustment of
>>propensity but covariate adjustment (using e.g. regression splines in
>>the logit of propensity) is often a better approach once you've been
>>careful about non-overlap.
>>
>>Frank Harrell
> 
> 
> 
> On Tue, 5 Apr 2005, Frank E Harrell Jr wrote:
> 
> 
>>adiamond at fas.harvard.edu wrote:
>>
>>>Dear R-list,
>>>
>>>i have 6 different sets of samples.  Each sample has about 5000 observations,
>>>with each observation comprised of 150 baseline covariates (X), 125 of which
>>>are dichotomous. Roughly 20% of the observations in each sample are "treatment"
>>>and the rest are "control" units.
>>>
>>>i am doing propensity score matching, i have already estimated propensity
>>>scores(predicted probabilities) using logistic regression, and in each sample i
>>>am going to have to exclude approximately 100 treated observations for which I
>>>cannot find matching control observations (because the scores for these treated
>>>units are outside the support of the scores for control units).
>>>
>>>in each sample, i must identify an exclusion rule that is interpretable on the
>>>scale of the X's that excludes these unmatchable treated observations and
>>>excludes as FEW of the remaining treated observations as possible.
>>>(the reason is that i want to be able to explain, in terms of the Xs, who the
>>>individuals are that I making causal inference about.)
>>>
>>>i've tried some simple stuff over the past few days and nothing's worked.
>>>is there an R-package or algorithm, or even estimation strategy that anyone
>>>could recommend?
>>>(i am really hoping so!)
>>>
>>>thank you,
>>>
>>>alexis diamond
>>>
>>
>>
>>
>>--
>>Frank E Harrell Jr   Professor and Chair           School of Medicine
>>                      Department of Biostatistics   Vanderbilt University
>>
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list