Matching with numerical tolerance

When using genmatch, there's a set of variables that I would like to match exactly on - with/without tolerance. It works for the variables where matching should be done with no tolerance (b_pat9903 & bvd_sector_num).
For the variable last_avail_year , I would like to allow for +-1 differences between treated and potential matches. I try to do this in the last part of the code, but get the error: Error in exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - :
replacement has length zero

zs <- (AT$ETS)
Y <- (AT$employees0104)
Xs <- data.frame(AT$bvd_sector_num, AT$last_avail_year, AT$incorporation_year, AT$revenue_last, AT$green_pat9903, AT$b_green_pat9903, AT$pat9903, AT$b_pat9903)
balance_matrix <- Xs
gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals")
exact <- rep(FALSE, length(colnames(Xs)))
exact[grep("b_pat9903", colnames(Xs))] <- TRUE
exact[grep("bvd_sector_num", colnames(Xs))] <- TRUE
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1

Would appreciate any help! :slight_smile:

Without a completereprex (see the FAQ), including representative data.

Alternatively please explain what you want to do differently from the help("GemMatch") example

#> Loading required package: MASS
#> ## 
#> ##  Matching (Version 4.10-8, Build Date: 2022-11-03)
#> ##  See for additional documentation.
#> ##  Please cite software as:
#> ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
#> ##   Software with Automated Balance Optimization: The Matching package for R.''
#> ##   Journal of Statistical Software, 42(7): 1-52. 
#> ##

#The covariates we want to match on
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

#The covariates we want to obtain balance on
BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,

#Let's call GenMatch() to find the optimal weight to give each
#covariate in 'X' so as we have achieved balance on the covariates in

BalanceMat’. This is only an example so we want GenMatch to be quick

#so the population size has been set to be only 16 via the 'pop.size'
#option. This is *WAY* too small for actual problems.
#For details see
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)
#> Loading required namespace: rgenoud
#> Thu Jun  1 02:50:24 2023
#> Domains:
#>  0.000000e+00   <=  X1   <=    1.000000e+03 
#>  0.000000e+00   <=  X2   <=    1.000000e+03 
#>  0.000000e+00   <=  X3   <=    1.000000e+03 
#>  0.000000e+00   <=  X4   <=    1.000000e+03 
#>  0.000000e+00   <=  X5   <=    1.000000e+03 
#>  0.000000e+00   <=  X6   <=    1.000000e+03 
#>  0.000000e+00   <=  X7   <=    1.000000e+03 
#>  0.000000e+00   <=  X8   <=    1.000000e+03 
#>  0.000000e+00   <=  X9   <=    1.000000e+03 
#>  0.000000e+00   <=  X10  <=    1.000000e+03 
#> Data Type: Floating Point
#> Operators (code number, name, population) 
#>  (1) Cloning...........................  1
#>  (2) Uniform Mutation..................  2
#>  (3) Boundary Mutation.................  2
#>  (4) Non-Uniform Mutation..............  2
#>  (5) Polytope Crossover................  2
#>  (6) Simple Crossover..................  2
#>  (7) Whole Non-Uniform Mutation........  2
#>  (8) Heuristic Crossover...............  2
#>  (9) Local-Minimum Crossover...........  0
#> SOFT Maximum Number of Generations: 10
#> Maximum Nonchanging Generations: 1
#> Population size       : 16
#> Convergence Tolerance: 1.000000e-03
#> Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
#> Not Checking Gradients before Stopping.
#> Using Out of Bounds Individuals.
#> Maximization Problem.
#> GENERATION: 0 (initializing the population)
#> Lexical Fit..... 1.965387e-01  1.965387e-01  2.567638e-01  2.567638e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  4.790353e-01  5.357585e-01  5.638495e-01  5.638495e-01  7.928051e-01  8.747544e-01  9.005085e-01  9.726726e-01  9.757135e-01  9.972415e-01  9.972415e-01  9.999977e-01  1.000000e+00  1.000000e+00  
#> #unique......... 16, #Total UniqueCount: 16
#> var 1:
#> best............ 2.831305e+02
#> mean............ 4.339086e+02
#> variance........ 9.205897e+04
#> var 2:
#> best............ 4.582351e+01
#> mean............ 4.808526e+02
#> variance........ 9.951977e+04
#> var 3:
#> best............ 6.079865e+01
#> mean............ 4.581846e+02
#> variance........ 7.144076e+04
#> var 4:
#> best............ 1.419532e+02
#> mean............ 4.329569e+02
#> variance........ 6.749356e+04
#> var 5:
#> best............ 6.269561e+01
#> mean............ 5.433120e+02
#> variance........ 1.007233e+05
#> var 6:
#> best............ 4.022851e+02
#> mean............ 3.998682e+02
#> variance........ 6.291170e+04
#> var 7:
#> best............ 3.013591e+01
#> mean............ 4.630557e+02
#> variance........ 9.051718e+04
#> var 8:
#> best............ 3.071364e+02
#> mean............ 4.540889e+02
#> variance........ 8.608832e+04
#> var 9:
#> best............ 1.045635e+02
#> mean............ 2.749121e+02
#> variance........ 6.837157e+04
#> var 10:
#> best............ 9.450274e+01
#> mean............ 6.127432e+02
#> variance........ 1.050091e+05
#> Lexical Fit..... 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> #unique......... 12, #Total UniqueCount: 28
#> var 1:
#> best............ 3.299557e+02
#> mean............ 2.640421e+02
#> variance........ 3.959094e+04
#> var 2:
#> best............ 4.294913e+01
#> mean............ 2.157304e+02
#> variance........ 9.514945e+04
#> var 3:
#> best............ 6.079865e+01
#> mean............ 2.973923e+02
#> variance........ 6.926883e+04
#> var 4:
#> best............ 1.419532e+02
#> mean............ 3.992844e+02
#> variance........ 6.263337e+04
#> var 5:
#> best............ 6.269561e+01
#> mean............ 3.973911e+02
#> variance........ 1.003613e+05
#> var 6:
#> best............ 5.576743e+02
#> mean............ 3.694883e+02
#> variance........ 2.014507e+04
#> var 7:
#> best............ 3.013591e+01
#> mean............ 3.757987e+02
#> variance........ 8.645335e+04
#> var 8:
#> best............ 4.150622e+02
#> mean............ 3.565758e+02
#> variance........ 2.384791e+04
#> var 9:
#> best............ 1.045635e+02
#> mean............ 2.224873e+02
#> variance........ 3.323147e+04
#> var 10:
#> best............ 3.818596e+02
#> mean............ 6.225901e+02
#> variance........ 1.460706e+05
#> Lexical Fit..... 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> #unique......... 12, #Total UniqueCount: 40
#> var 1:
#> best............ 3.299557e+02
#> mean............ 3.560685e+02
#> variance........ 1.452756e+04
#> var 2:
#> best............ 4.294913e+01
#> mean............ 4.720064e+01
#> variance........ 9.665896e+02
#> var 3:
#> best............ 6.079865e+01
#> mean............ 1.694049e+02
#> variance........ 5.988433e+04
#> var 4:
#> best............ 1.419532e+02
#> mean............ 2.082209e+02
#> variance........ 2.321037e+04
#> var 5:
#> best............ 6.269561e+01
#> mean............ 2.029439e+02
#> variance........ 7.731470e+04
#> var 6:
#> best............ 5.576743e+02
#> mean............ 4.354230e+02
#> variance........ 7.085862e+03
#> var 7:
#> best............ 3.013591e+01
#> mean............ 1.607625e+02
#> variance........ 6.746991e+04
#> var 8:
#> best............ 4.150622e+02
#> mean............ 3.494544e+02
#> variance........ 2.621848e+03
#> var 9:
#> best............ 1.045635e+02
#> mean............ 1.705721e+02
#> variance........ 2.688426e+04
#> var 10:
#> best............ 3.818596e+02
#> mean............ 2.915842e+02
#> variance........ 7.914410e+04
#> Lexical Fit..... 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> #unique......... 12, #Total UniqueCount: 52
#> var 1:
#> best............ 3.299557e+02
#> mean............ 3.156883e+02
#> variance........ 6.347119e+02
#> var 2:
#> best............ 4.294913e+01
#> mean............ 5.417629e+01
#> variance........ 1.679504e+03
#> var 3:
#> best............ 6.079865e+01
#> mean............ 6.985327e+01
#> variance........ 4.963423e+02
#> var 4:
#> best............ 1.419532e+02
#> mean............ 1.435664e+02
#> variance........ 6.771293e+02
#> var 5:
#> best............ 6.269561e+01
#> mean............ 6.184391e+01
#> variance........ 2.193731e+00
#> var 6:
#> best............ 5.576743e+02
#> mean............ 4.942367e+02
#> variance........ 7.572460e+03
#> var 7:
#> best............ 3.013591e+01
#> mean............ 5.851087e+01
#> variance........ 8.565154e+03
#> var 8:
#> best............ 4.150622e+02
#> mean............ 3.686187e+02
#> variance........ 3.207996e+03
#> var 9:
#> best............ 1.045635e+02
#> mean............ 1.037300e+02
#> variance........ 3.372995e+00
#> var 10:
#> best............ 3.818596e+02
#> mean............ 2.483674e+02
#> variance........ 2.473883e+04
#> 'wait.generations' limit reached.
#> No significant improvement in 1 generations.
#> Solution Lexical Fitness Value:
#> 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> Parameters at the Solution:
#>  X[ 1] : 3.299557e+02
#>  X[ 2] : 4.294913e+01
#>  X[ 3] : 6.079865e+01
#>  X[ 4] : 1.419532e+02
#>  X[ 5] : 6.269561e+01
#>  X[ 6] : 5.576743e+02
#>  X[ 7] : 3.013591e+01
#>  X[ 8] : 4.150622e+02
#>  X[ 9] : 1.045635e+02
#>  X[10] : 3.818596e+02
#> Solution Found Generation 1
#> Number of Generations Run 3
#> Thu Jun  1 02:50:24 2023
#> Total run time : 0 hours 0 minutes and 0 seconds

#The outcome variable

# Now that GenMatch() has found the optimal weights, let's estimate
# our causal effect of interest using those weights
mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout)
#> Estimate...  1.7968 
#> AI SE......  0.72812 
#> T-stat.....  2.4678 
#> p.val......  0.013596 
#> Original number of observations..............  445 
#> Original number of treated obs...............  185 
#> Matched number of observations...............  445 
#> Matched number of observations  (unweighted).  614

#Let's determine if balance has actually been obtained on the variables of interest
mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+
                     re75+ re74+ I(re74*re75),
                   match.out=mout, nboots=500)
#> ***** (V1) age *****
#>                        Before Matching        After Matching
#> mean treatment........     25.816             25.184 
#> mean control..........     25.054             25.229 
#> std mean diff.........     10.655           -0.67659 
#> mean raw eQQ diff.....    0.94054            0.35342 
#> med  raw eQQ diff.....          1                  0 
#> max  raw eQQ diff.....          7                  8 
#> mean eCDF diff........   0.025364          0.0096762 
#> med  eCDF diff........   0.022193          0.0065147 
#> max  eCDF diff........   0.065177           0.030945 
#> var ratio (Tr/Co).....     1.0278            0.90561 
#> T-test p-value........    0.26594            0.73536 
#> KS Bootstrap p-value..      0.508              0.758 
#> KS Naive p-value......     0.7481            0.93044 
#> KS Statistic..........   0.065177           0.030945 
#> ***** (V2) educ *****
#>                        Before Matching        After Matching
#> mean treatment........     10.346             10.227 
#> mean control..........     10.088             10.228 
#> std mean diff.........     12.806          -0.079008 
#> mean raw eQQ diff.....    0.40541            0.10912 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          2                  2 
#> mean eCDF diff........   0.028698          0.0077943 
#> med  eCDF diff........   0.012682          0.0040717 
#> max  eCDF diff........    0.12651           0.035831 
#> var ratio (Tr/Co).....     1.5513             1.0364 
#> T-test p-value........    0.15017            0.97584 
#> KS Bootstrap p-value..      0.014               0.38 
#> KS Naive p-value......   0.062873            0.82547 
#> KS Statistic..........    0.12651           0.035831 
#> ***** (V3) black *****
#>                        Before Matching        After Matching
#> mean treatment........    0.84324             0.8382 
#> mean control..........    0.82692             0.8427 
#> std mean diff.........     4.4767             -1.219 
#> mean raw eQQ diff.....   0.016216          0.0032573 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> mean eCDF diff........  0.0081601          0.0016287 
#> med  eCDF diff........  0.0081601          0.0016287 
#> max  eCDF diff........    0.01632          0.0032573 
#> var ratio (Tr/Co).....    0.92503             1.0231 
#> T-test p-value........    0.64736            0.47962 
#> ***** (V4) hisp *****
#>                        Before Matching        After Matching
#> mean treatment........   0.059459           0.085393 
#> mean control..........    0.10769            0.08764 
#> std mean diff.........    -20.341            -0.8032 
#> mean raw eQQ diff.....   0.048649          0.0016287 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> mean eCDF diff........   0.024116         0.00081433 
#> med  eCDF diff........   0.024116         0.00081433 
#> max  eCDF diff........   0.048233          0.0016287 
#> var ratio (Tr/Co).....    0.58288            0.97676 
#> T-test p-value........   0.064043            0.31731 
#> ***** (V5) married *****
#>                        Before Matching        After Matching
#> mean treatment........    0.18919            0.16404 
#> mean control..........    0.15385            0.15506 
#> std mean diff.........     8.9995             2.4246 
#> mean raw eQQ diff.....   0.037838          0.0065147 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> mean eCDF diff........   0.017672          0.0032573 
#> med  eCDF diff........   0.017672          0.0032573 
#> max  eCDF diff........   0.035343          0.0065147 
#> var ratio (Tr/Co).....     1.1802             1.0467 
#> T-test p-value........    0.33425            0.37115 
#> ***** (V6) nodegr *****
#>                        Before Matching        After Matching
#> mean treatment........    0.70811            0.78202 
#> mean control..........    0.83462            0.78202 
#> std mean diff.........    -27.751                  0 
#> mean raw eQQ diff.....    0.12432                  0 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  0 
#> mean eCDF diff........   0.063254                  0 
#> med  eCDF diff........   0.063254                  0 
#> max  eCDF diff........    0.12651                  0 
#> var ratio (Tr/Co).....     1.4998                  1 
#> T-test p-value........  0.0020368                  1 
#> ***** (V7) u74 *****
#>                        Before Matching        After Matching
#> mean treatment........    0.70811            0.74157 
#> mean control..........       0.75            0.73483 
#> std mean diff.........    -9.1895             1.5382 
#> mean raw eQQ diff.....   0.037838          0.0016287 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> mean eCDF diff........   0.020946         0.00081433 
#> med  eCDF diff........   0.020946         0.00081433 
#> max  eCDF diff........   0.041892          0.0016287 
#> var ratio (Tr/Co).....     1.1041            0.98352 
#> T-test p-value........    0.33033            0.40546 
#> ***** (V8) u75 *****
#>                        Before Matching        After Matching
#> mean treatment........        0.6            0.64719 
#> mean control..........    0.68462            0.64944 
#> std mean diff.........    -17.225           -0.46975 
#> mean raw eQQ diff.....   0.081081          0.0016287 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> mean eCDF diff........   0.042308         0.00081433 
#> med  eCDF diff........   0.042308         0.00081433 
#> max  eCDF diff........   0.084615          0.0016287 
#> var ratio (Tr/Co).....     1.1133             1.0029 
#> T-test p-value........   0.068031            0.31731 
#> ***** (V9) re75 *****
#>                        Before Matching        After Matching
#> mean treatment........     1532.1             1264.4 
#> mean control..........     1266.9             1323.9 
#> std mean diff.........     8.2363            -2.0509 
#> mean raw eQQ diff.....     367.61             114.78 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....     2110.2             8195.6 
#> mean eCDF diff........   0.050834          0.0067367 
#> med  eCDF diff........   0.061954          0.0065147 
#> max  eCDF diff........    0.10748           0.022801 
#> var ratio (Tr/Co).....     1.0763             1.0118 
#> T-test p-value........    0.38527            0.45981 
#> KS Bootstrap p-value..      0.046              0.782 
#> KS Naive p-value......    0.16449            0.99724 
#> KS Statistic..........    0.10748           0.022801 
#> ***** (V10) re74 *****
#>                        Before Matching        After Matching
#> mean treatment........     2095.6             1961.2 
#> mean control..........       2107             1991.4 
#> std mean diff.........   -0.23437           -0.62292 
#> mean raw eQQ diff.....     487.98             172.06 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....       8413             7870.3 
#> mean eCDF diff........   0.019223          0.0057796 
#> med  eCDF diff........     0.0158           0.004886 
#> max  eCDF diff........   0.047089           0.021173 
#> var ratio (Tr/Co).....     0.7381            0.89669 
#> T-test p-value........    0.98186            0.69561 
#> KS Bootstrap p-value..      0.604              0.706 
#> KS Naive p-value......    0.97023            0.99914 
#> KS Statistic..........   0.047089           0.021173 
#> ***** (V11) I(re74 * re75) *****
#>                        Before Matching        After Matching
#> mean treatment........   13118591           11633548 
#> mean control..........   14530303           12543909 
#> std mean diff.........    -2.7799            -2.0317 
#> mean raw eQQ diff.....    3278733            1703799 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....  188160151          188160151 
#> mean eCDF diff........   0.022723           0.004617 
#> med  eCDF diff........   0.014449          0.0032573 
#> max  eCDF diff........   0.061019           0.014658 
#> var ratio (Tr/Co).....    0.69439             0.7597 
#> T-test p-value........    0.79058            0.54267 
#> KS Bootstrap p-value..      0.322              0.928 
#> KS Naive p-value......    0.81575                  1 
#> KS Statistic..........   0.061019           0.014658 
#> Before Matching Minimum p.value: 0.0020368 
#> Variable Name(s): nodegr  Number(s): 6 
#> After Matching Minimum p.value: 0.31731 
#> Variable Name(s): hisp  Number(s): 4

Created on 2023-06-01 with reprex v2.0.2

Hi @technocrat ,
Thank you for your response.

Yes, i apologize for the lack of a data example - I'm rather new here!

Treated firms have ETS == 1 and potential controls have ETS == 0. In the listed example, row 8 is a treated unit.

With genmatch i find potential matches by giving weights to covariates over the full sample. Within these potential matches, i want to match exactly on: : b_pat9903, bvd_sector_num & last_available_year. For last_avail_year i want to allow that a potential control units varies with +-1 from the treated. I.e., potential control units that isn't similar to any treated unit, on the 3 variables, should be dropped, and treated units where no controls are similar on the 3 variables should be dropped.

In the listed example, obs. 3 and 8 would be matched since they are equal on b_pat9903 and bvd_sector_num, and within +-1 on last_avail_year. Every other obs. should be dropped.

I hope this properly explains my problem, and that the reprex provided is sufficient - if not please let me know.

Once again, thank you for your time :slight_smile:

## Dataex
head(AT, 10)[, c('bvd_sector_num', 'incorporation_year', 'revenue_last', 'ETS', 'last_avail_year', 'green_pat9903', 'b_green_pat9903', 'pat9903', 'b_pat9903')]
datapasta::df_paste(head(AT, 10)[, c('bvd_sector_num', 'incorporation_year', 'revenue_last', 'ETS', 'last_avail_year', 'green_pat9903', 'b_green_pat9903', 'pat9903', 'b_pat9903')]) 
      bvd_sector_num = c(13, 28, 21, 15, 4, 15, 1, 21, 28, 25),
  incorporation_year = c(1939,1989,1988,1973,
        revenue_last = c(160247.04721,7480.82644,
                 ETS = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0),
     last_avail_year = c(2021,2021,2021,2021,
       green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
     b_green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             pat9903 = c(0, 0, 0, 35, 0, 28, 0, 0, 0, 0),
           b_pat9903 = c(0, 0, 0, 1, 0, 1, 0, 0, 0, 0)
#>    bvd_sector_num incorporation_year revenue_last ETS last_avail_year
#> 1              13               1939   160247.047   0            2021
#> 2              28               1989     7480.826   0            2021
#> 3              21               1988     1219.000   0            2021
#> 4              15               1973  2114709.638   0            2021
#> 5               4               1970  2479725.694   0            2021
#> 6              15               1979   271142.605   0            2021
#> 7               1               1996      457.000   0            2023
#> 8              21               1985      496.500   1            2022
#> 9              28               1989     2543.550   0            2001
#> 10             25               1992     2890.000   0            2020
#>    green_pat9903 b_green_pat9903 pat9903 b_pat9903
#> 1              0               0       0         0
#> 2              0               0       0         0
#> 3              0               0       0         0
#> 4              0               0      35         1
#> 5              0               0       0         0
#> 6              0               0      28         1
#> 7              0               0       0         0
#> 8              0               0       0         0
#> 9              0               0       0         0
#> 10             0               0       0         0

# Code 
zs <- (AT$ETS)
Y <- (AT$employees0104)
Xs <- data.frame(AT$bvd_sector_num, AT$last_avail_year, AT$incorporation_year, AT$revenue_last, AT$green_pat9903, AT$b_green_pat9903, AT$pat9903, AT$b_pat9903)
balance_matrix <- Xs 
gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals")
exact <- rep(FALSE, length(colnames(Xs)))
exact[grep("b_pat9903", colnames(Xs))] <- TRUE 
exact[grep("bvd_sector_num", colnames(Xs))] <- TRUE
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1
m <- Match(Y=Y, Tr=zs, X=Xs, Weight.matrix=gen, exact = exact)

Updated reprex to include Y in data.frame

## Dataex
      random_numbers = c(0.409526376752183,
      bvd_sector_num = c("13","28","21","15","4",
  incorporation_year = c(1939,1989,1988,1973,
        revenue_last = c(160247.04721,7480.82644,
                 ETS = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
     last_avail_year = c(2021,2021,2021,2021,
       green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
     b_green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             pat9903 = c(0, 0, 0, 35, 0, 28, 0, 0, 0, 0),
           b_pat9903 = c(0, 0, 0, 1, 0, 1, 0, 0, 0, 0)
#>    random_numbers bvd_sector_num incorporation_year revenue_last ETS
#> 1       0.4095264             13               1939   160247.047   0
#> 2       0.8239268             28               1989     7480.826   0
#> 3       0.2447119             21               1988     1219.000   0
#> 4       0.7820430             15               1973  2114709.638   0
#> 5       0.6020031              4               1970  2479725.694   0
#> 6       0.7957060             15               1979   271142.605   0
#> 7       0.9766439              1               1996      457.000   0
#> 8       0.7779523             21               1985      496.500   0
#> 9       0.1227380             28               1989     2543.550   0
#> 10      0.9598633             25               1992     2890.000   0
#>    last_avail_year green_pat9903 b_green_pat9903 pat9903 b_pat9903
#> 1             2021             0               0       0         0
#> 2             2021             0               0       0         0
#> 3             2013             0               0       0         0
#> 4             2021             0               0      35         1
#> 5             2021             0               0       0         0
#> 6             2021             0               0      28         1
#> 7             2023             0               0       0         0
#> 8             2022             0               0       0         0
#> 9             2001             0               0       0         0
#> 10            2020             0               0       0         0

# Code 
zs <- (AT$ETS)
Y <- (AT$random_numbers)
Xs <- data.frame(AT$bvd_sector_num, AT$last_avail_year, AT$incorporation_year, AT$revenue_last, AT$green_pat9903, AT$pat9903, AT$b_pat9903)
balance_matrix <- as.matrix(cbind(Xs, I(AT$revenue_last^2), I(AT$green_pat9903*AT$incorporation_year), AT$b_green_pat9903))
gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals", caliper=c(100,100,0.1,100,100,0.1,0.1))
exact <- rep(FALSE, length(colnames(Xs)))
exact[grep("b_pat9903", colnames(Xs))] <- TRUE 
exact[grep("bvd_sector_num", colnames(Xs))] <- TRUE
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1
m <- Match(Y=Y, Tr=zs, X=Xs, Weight.matrix=gen, exact = exact)

Are you getting the following error?

> gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals", caliper=c(100,100,0.1,100,100,0.1,0.1))
Loading required namespace: rgenoud
Error in GenMatch(Tr = zs, X = Xs, BalanceMatrix = balance_matrix, pop.size = 500,  : 
  Treatment indicator ('Tr') must contain both treatment and control observations

Use the reprex() function (most conveniently through the RStudio addin menu after installing the package. It will avoid problems arising from running a code example in a "used" environment with objects in name space, such as


which would only be there if the unnamed data frame had been attach()ed. and would catch the unexpected symbol (the space)

What I could see is that bvd_sector_num is typeof() character, and that coerces

balance_matrix <- as.matrix(cbind(Xs, I(AT$revenue_last^2), I(AT$green_pat9903 * AT$incorporation_year), AT$b_green_pat9903))

to character also, which ruins it as the Balance.Matrix argument to GenMatch.

typeof(bvd_sector_num) = double, so I think that's just because i messed up the reprex again.

I should clarify:
My GenMatch works properly, although not on reprex provided, when removing:
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1

So the code listed here is my only problem. I can't figure out how to allow potential controls to vary with +-1 from the treated on last_avail_year.

(exact <- c(FALSE,FALSE))

exact[1] <- TRUE

abs(c(1,1) -c(2,2)) < 3 

exact[2] <- abs(c(1,1) -c(2,2)) < 3 

you have the following problem.
you wish to set one entry of exact to either true of false
you calculate a larger number of true/false than 1 and try to jam it into there.

things to consider :

Either, you need exact to be larger in order to accommodate the greater volume of true/false conditions you want to store there, or else you need to summarise the true and false conditions to evaluate to a single true or false.

1 Like

This makes a lot of sense.

I rewrote it to:
exact[grep("last_avail_year", colnames(Xs1))] <- any(abs(Xs1$last_avail_year - testx$last_avail_year) <= 1)

Which solved my problems. Thank you for the help :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.