Perform balanced risk set matching as described in Li et al. (2001) "Balanced Risk Set Matching". Given a longitudinal data frame with covariate information, along with treatment time, build a MIP problem that matches treated individuals to those that haven't been treated yet (or are never treated) based on minimizing the Mahalanobis distance between covariates. If balancing is desired, the model will try to minimize the imbalance in terms of specified balancing covariates in the final pair output. Each treated individual is matched to one other individual.
Arguments
- n_pairs
The number of pairs desired from matching.
- data
A data.frame or similar containing columns matching the
id, time, trt_time
arguments, and covariates. This data frame is expected to be in tidy, long format, so thatid
,trt_time
, and other variables may be repeated for different values oftime
. The data.frame should be unique atid
andtime
.- id
A character specifying the id column name (default
'id'
).- time
A character specifying the time column name (default
'time'
).- trt_time
A character specifying the treatment time column name (default
'trt_time'
).- covariates
A character vector specifying the covariates to use for matching (default
NULL
). IfNULL
, this will default to all columns except those named by theid
,time
, andtrt_time
arguments.- balance
A logical value indicating whether to include balancing constraints in the matching process.
- balance_covariates
A character vector specifying the covariates to use for balancing (default
NULL
). IfNULL
, this will default to all columns except those named by theid
,time
, andtrt_time
arguments.- exact_match
A vector of optional covariates to perform exact matching on. If
NULL
, no exact matching is done.- options
A list of additional parameters with the following components:
time_lag
A logical value indicating whether the matches should be made on the time period preceding treatment. This can help avoid confounding if treatment happens between two periods.verbose
A logical value indicating whether to print information to the console during a potentially long matching process.optimizer
The optimizer to use (default'glpk'
). The option'gurobi'
requires an external license and package, but offers speed improvements.
Value
A data frame containing the pair information. The data frame has
columns id
, pair_id
, and type
. id
matches the input parameter and
will contain all ids from the input data frame. pair_id
refers to the id
of the computed pairs; NA
values indicate unmatched individuals. type
indicates whether the individual in the pair is considered as treatment ("trt") or control ("all") in that pair.
Details
Note that when using exact matching, the n_pairs
are split roughly in
proportion to the number of treated subjects in each exact matching group.
If you would like to control n_pairs
exactly, we suggest manually
performing exact matching, for example with split()
, and selecting
n_pairs
for each group interactively.
References
Li, Yunfei Paul, Kathleen J Propert, and Paul R Rosenbaum. 2001. "Balanced Risk Set Matching." Journal of the American Statistical Association 96 (455): 870-82. doi:10.1198/016214501753208573
Examples
if (requireNamespace("Rglpk", quietly = TRUE)) {
library(dplyr, quietly = TRUE)
pairs <- brsmatch(
n_pairs = 13,
data = oasis,
id = "subject_id",
time = "visit",
trt_time = "time_of_ad",
balance = FALSE
)
na.omit(pairs)
# evaluate the first match
first_match <- pairs$subject_id[which(pairs$pair_id == 1)]
oasis %>% dplyr::filter(subject_id %in% first_match)
}
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
#> # A tibble: 6 × 11
#> subject_id visit time_of_ad m_f educ ses age mr_delay e_tiv n_wbv asf
#> <chr> <int> <dbl> <chr> <int> <fct> <int> <int> <int> <dbl> <dbl>
#> 1 OAS2_0007 1 3 M 16 -1 71 0 1357 0.748 1.29
#> 2 OAS2_0007 3 3 M 16 -1 73 518 1365 0.727 1.29
#> 3 OAS2_0007 4 3 M 16 -1 75 1281 1372 0.71 1.28
#> 4 OAS2_0058 1 NA M 14 3 78 0 1315 0.707 1.34
#> 5 OAS2_0058 2 NA M 14 3 79 212 1308 0.706 1.34
#> 6 OAS2_0058 3 NA M 14 3 80 764 1324 0.695 1.33