| Type: | Package | 
| Title: | Near-Far Matching | 
| Version: | 1.3 | 
| Date: | 2024-01-22 | 
| Author: | Joseph Rigdon <jrigdon@wakehealth.edu> | 
| Maintainer: | Joseph Rigdon <jrigdon@wakehealth.edu> | 
| Imports: | GenSA, MASS, car, stats | 
| Description: | Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. Methods outlined in further detail in Rigdon, Baiocchi, and Basu (2018) <doi:10.18637/jss.v086.c05>. | 
| License: | GPL-3 | 
| Depends: | nbpMatching | 
| NeedsCompilation: | no | 
| Packaged: | 2024-01-22 14:18:48 UTC; joerigdon | 
| Repository: | CRAN | 
| Date/Publication: | 2024-01-23 13:00:02 UTC | 
Near-Far Matching
Description
Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable.
Details
| Package: | nearfar | 
| Type: | Package | 
| Version: | 1.3 | 
| Date: | 2024-01-15 | 
| License: | GPL-3 | 
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Rigdon J, Baiocchi M, Basu S (2018). Near-far matching in R: The nearfar package. Journal of Statistical Software, 86(5), 1-21.
Baiocchi M, Small D, Lorch S, Rosenbaum P (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association, 105(492), 1285-1296.
Baiocchi M, Small D, Yang L, Polsky D, Groeneveld P (2012). Near-far matching: a study design approach to instrumental variables. Health Services and Outcomes Research Methodology, 12(4), 237-253.
Angrist data set for education and wages
Description
A random sample of 1000 observations from the data set used by Angrist and Krueger in their investigation of the impact ' of education on future wages.
Format
A data frame with 1000 observations on the following 7 variables.
- wage
- a numeric vector 
- educ
- a numeric vector 
- qob
- a numeric vector 
- IV
- a numeric vector 
- age
- a numeric vector 
- married
- a numeric vector 
- race
- a numeric vector 
Details
This data set is a random sample of 1000 observations from the URL listed below.
Source
https://economics.mit.edu/people/faculty/josh-angrist/angrist-data-archive
References
Angrist JD, Krueger AB (1991). Does Compulsory School Attendance Affect Schooling and Earnings? The Quarterly Journal of Economics, 106(4), 979-1014.
Examples
library(nearfar)
str(angrist)
## maybe str(angrist) ; plot(angrist) ...
Matching priority function
Description
Updates given distance matrix to prioritize specified measured
confounders in a pair match.  Used in consort with
matches function to prioritize specific measured
confounders in a near-far match in the opt_nearfar function.
Usage
calipers(distmat, variable, tolerance = 0.2)
Arguments
| distmat | An object of class distance matrix | 
| variable | Named variable from list of measured confounders | 
| tolerance | Penalty to apply to mismatched observations; values near 0 penalize mismatches more | 
Value
Returns an updated distance matrix
See Also
Examples
dd = mtcars[1:4, 2:3]
cc = calipers(distmat=smahal(dd), variable=dd$cyl, tolerance=0.2)
cc
Inference for effect ratio
Description
Conducts inference on effect ratio as described in Section 3.3 of Baiocchi (2010), resulting in an estimate and a permutation based confidence interval for the effect ratio.
Usage
eff_ratio(dta, match, outc, trt, alpha)
Arguments
| dta | The name of the data frame object | 
| match | Data frame where first column contains indices for those
individuals encouraged into treatment by instrumental variable and
second column contains indices for those individuals discouraged
from treatment by instrumental variable; returned by both
 | 
| outc | The name of the outcome variable in quotes, e.g., “wages” | 
| trt | The name of the treatment variable, e.g., “educ” | 
| alpha | Level of confidence interval | 
Value
| est.emp | Empirical estimate of effect ratio | 
| est.HL | Hodges-Lehmann type estimate of effect ratio | 
| lower | Lower limit to 1-alpha/2 confidence interval for effect ratio | 
| upper | Upper limit to 1-alpha/2 confidence interval for effect ratio | 
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Baiocchi M, Small D, Lorch S, Rosenbaum P (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. Journal of the American Statistical Association, 105(492), 1285-1296.
Examples
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
eff_ratio(dta=mtcars, match=k2, outc="wt", trt="gear", alpha=0.05)
Function to find pair matches using a distance matrix.  Called by
opt_nearfar to discover optimal near-far matches.
Description
Given values of percent sinks and cutpoint, this function will find the corresponding near-far match
Usage
matches(dta, covs, iv = NA, imp.var = NA, tol.var = NA, sinks = 0,
    cutpoint = NA)
Arguments
| dta | The name of the data frame on which to do the matching | 
| covs | A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race") | 
| iv | The name of the instrumental variable, e.g., iv="QOB" | 
| imp.var | A list of (up to 5) named variables to prioritize in the “near” matching | 
| tol.var | A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest penalty for mismatch | 
| sinks | Percentage of the data to match to sinks (and thus remove) if desired; default is 0 | 
| cutpoint | Value below which individuals are too similar on iv; increase to make individuals more “far” in match | 
Details
Default settings yield a "near" match on only observed confounders in X; add IV, sinks, and cutpoint to get near-far match.
Value
A two-column matrix of row indices of paired matches
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.
See Also
Examples
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
    cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
k2[1:5, ]
Finds optimal near-far match
Description
Discovers optimal near-far matches using the partial F statistic (for continuous treatments) or partial deviance (for binary and treatments)
Usage
opt_nearfar(dta, trt, covs, iv, trt.type = "cont", imp.var = NA,
tol.var = NA, adjust.IV = TRUE, sink.range = c(0, 0.5), cutp.range = NA,
max.time.seconds = 300)
Arguments
| dta | The name of the data frame on which matching was performed | 
| trt | The name of the treatment variable, e.g., “educ” | 
| iv | The name of the instrumental variable, e.g., iv="QOB" | 
| covs | A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race") | 
| trt.type | Treatment variable type: “cont” for continuous, or “bin” for binary | 
| imp.var | A list of (up to 5) named variables to prioritize in the “near” matching | 
| tol.var | A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest penalty for mismatch | 
| adjust.IV | if TRUE, include measured confounders in treatment~IV model that is optimized; if FALSE, exclude | 
| sink.range | A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed | 
| cutp.range | a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV) | 
| max.time.seconds | How long to let the optimization algorithm run; default is 300 seconds = 5 minutes | 
Value
| n.calls | Number of calls made to the objective function | 
| sink.range | A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed | 
| cutp.range | a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV) | 
| pct.sink | Optimal percent sinks | 
| cutp | Optimal cutpoint | 
| maxF | Highest value of partial F-statistic (continuous treatment) or residual deviance (binary treatment) found by simulated annealing optimizer | 
| match | A two column matrix where the first column is the index of an “encouraged” individual and the second column is the index of the corresponding “discouraged” individual from the pair matching | 
| summ | A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable | 
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
References
Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.
Xiang Y, Gubian S, Suomela B, Hoeng J (2013). Generalized Simulated Annealing for Efficient Global Optimization: the GenSA Package for R. The R Journal, 5(1). URL http://journal.r-project.org/.
Examples
k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=2)
summary(k)
Compute rank-based Mahalanobis distance matrix between each pair
Description
This function computes the rank-based Mahalanobis distance matrix
between each pair of observations in the data set.  Called by
matches (and ultimately opt_nearfar)
function to set up a distance matrix used to create pair matches.
Usage
smahal(X)
Arguments
| X | A matrix of observed confounders with n rows (observations) and p columns (variables) | 
Value
Returns the rank-based Mahalanobis distance matrix between every pair of observations
Examples
smahal(mtcars[1:4, 2:3])
Computes table of absolute standardized differences
Description
Computes absolute standardized differences for both
continuous and binary variables.  Called by opt_nearfar to
summarize results of near-far match.
Usage
summ_matches(dta, iv, covs, match)
Arguments
| dta | The name of the data frame on which matching was performed | 
| iv | The name of the instrumental variable, e.g., iv="QOB" | 
| covs | A vector of the names of the covariates to make “near”, e.g., covs=c("age", "sex", "race") | 
| match | A two-column matrix of row indices of paired matches | 
Value
A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
See Also
Examples
k2 = matches(dta=mtcars, covs=c("cyl", "disp"), sinks=0.2, iv="carb",
     cutpoint=2, imp.var=c("cyl"), tol.var=0.03)
summ_matches(dta=mtcars, iv="carb", covs=c("cyl", "disp"), match=k2)
Summary method for object of class “nf”
Description
Displays key information, e.g., number of matches tried,
and post-match balance, for opt_nearfar function
Usage
## S3 method for class 'nf'
summary(object, ...)
Arguments
| object | Object of class “nf” returned by  | 
| ... | additional arguments affecting the summary produced | 
Value
Returns a summary of results from opt_nearfar function
Author(s)
Joseph Rigdon jrigdon@wakehealth.edu
See Also
Examples
k = opt_nearfar(dta=mtcars, trt="drat", covs=c("cyl", "disp"),
    trt.type="cont", iv="carb", imp.var=NA, tol.var=NA, adjust.IV=TRUE,
    max.time.seconds=1)
summary(k)