Skip to contents

Perform balanced risk set matching as described in Li et al. (2001) "Balanced Risk Set Matching". Given a longitudinal data frame with covariate information, along with treatment time, build a MIP problem that matches treated individuals to those that haven't been treated yet (or are never treated) based on minimizing the Mahalanobis distance between covariates. If balancing is desired, the model will try to minimize the imbalance in terms of specified balancing covariates in the final pair output. Each treated individual is matched to one other individual.

Usage

brsmatch(
  n_pairs,
  data,
  id = "id",
  time = "time",
  trt_time = "trt_time",
  covariates = NULL,
  balance = TRUE,
  balance_covariates = NULL,
  exact_match = NULL,
  options = list(time_lag = FALSE, verbose = FALSE, optimizer = c("glpk", "gurobi"))
)

Arguments

n_pairs

The number of pairs desired from matching.

data

A data.frame or similar containing columns matching the id, time, trt_time arguments, and covariates. This data frame is expected to be in tidy, long format, so that id, trt_time, and other variables may be repeated for different values of time. The data.frame should be unique at id and time.

id

A character specifying the id column name (default 'id').

time

A character specifying the time column name (default 'time').

trt_time

A character specifying the treatment time column name (default 'trt_time').

covariates

A character vector specifying the covariates to use for matching (default NULL). If NULL, this will default to all columns except those named by the id, time, and trt_time arguments.

balance

A logical value indicating whether to include balancing constraints in the matching process.

balance_covariates

A character vector specifying the covariates to use for balancing (default NULL). If NULL, this will default to all columns except those named by the id, time, and trt_time arguments.

exact_match

A vector of optional covariates to perform exact matching on. If NULL, no exact matching is done.

options

A list of additional parameters with the following components:

  • time_lag A logical value indicating whether the matches should be made on the time period preceding treatment. This can help avoid confounding if treatment happens between two periods.

  • verbose A logical value indicating whether to print information to the console during a potentially long matching process.

  • optimizer The optimizer to use (default 'glpk'). The option 'gurobi' requires an external license and package, but offers speed improvements.

Value

A data frame containing the pair information. The data frame has columns id, pair_id, and type. id matches the input parameter and will contain all ids from the input data frame. pair_id refers to the id of the computed pairs; NA values indicate unmatched individuals. type

indicates whether the individual in the pair is considered as treatment ("trt") or control ("all") in that pair.

Details

Note that when using exact matching, the n_pairs are split roughly in proportion to the number of treated subjects in each exact matching group. If you would like to control n_pairs exactly, we suggest manually performing exact matching, for example with split(), and selecting n_pairs for each group interactively.

References

Li, Yunfei Paul, Kathleen J Propert, and Paul R Rosenbaum. 2001. "Balanced Risk Set Matching." Journal of the American Statistical Association 96 (455): 870-82. doi:10.1198/016214501753208573

Author

Sean Kent

Examples

if (requireNamespace("Rglpk", quietly = TRUE)) {
  library(dplyr, quietly = TRUE)
  pairs <- brsmatch(
    n_pairs = 13,
    data = oasis,
    id = "subject_id",
    time = "visit",
    trt_time = "time_of_ad",
    balance = FALSE
  )

  na.omit(pairs)

  # evaluate the first match
  first_match <- pairs$subject_id[which(pairs$pair_id == 1)]
  oasis %>% dplyr::filter(subject_id %in% first_match)
}
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
#> # A tibble: 6 × 11
#>   subject_id visit time_of_ad m_f    educ ses     age mr_delay e_tiv n_wbv   asf
#>   <chr>      <int>      <dbl> <chr> <int> <fct> <int>    <int> <int> <dbl> <dbl>
#> 1 OAS2_0007      1          3 M        16 -1       71        0  1357 0.748  1.29
#> 2 OAS2_0007      3          3 M        16 -1       73      518  1365 0.727  1.29
#> 3 OAS2_0007      4          3 M        16 -1       75     1281  1372 0.71   1.28
#> 4 OAS2_0058      1         NA M        14 3        78        0  1315 0.707  1.34
#> 5 OAS2_0058      2         NA M        14 3        79      212  1308 0.706  1.34
#> 6 OAS2_0058      3         NA M        14 3        80      764  1324 0.695  1.33