Matching with numerical tolerance

lorenzo_il_magnifico · June 1, 2023, 9:14am

Hi,
When using genmatch, there's a set of variables that I would like to match exactly on - with/without tolerance. It works for the variables where matching should be done with no tolerance (b_pat9903 & bvd_sector_num).
For the variable last_avail_year , I would like to allow for +-1 differences between treated and potential matches. I try to do this in the last part of the code, but get the error: Error in exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - :
replacement has length zero

Code:
zs <- (AT$ETS)
Y <- (AT$employees0104)
Xs <- data.frame(AT$bvd_sector_num, AT$last_avail_year, AT$incorporation_year, AT$revenue_last, AT$green_pat9903, AT$b_green_pat9903, AT$pat9903, AT$b_pat9903)
balance_matrix <- Xs
gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals")
exact <- rep(FALSE, length(colnames(Xs)))
exact[grep("b_pat9903", colnames(Xs))] <- TRUE
exact[grep("bvd_sector_num", colnames(Xs))] <- TRUE
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1

Would appreciate any help!

technocrat · June 1, 2023, 9:51am

Without a completereprex (see the FAQ), including representative data.

Alternatively please explain what you want to do differently from the help("GemMatch") example

library(Matching)
#> Loading required package: MASS
#> ## 
#> ##  Matching (Version 4.10-8, Build Date: 2022-11-03)
#> ##  See http://sekhon.berkeley.edu/matching for additional documentation.
#> ##  Please cite software as:
#> ##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching
#> ##   Software with Automated Balance Optimization: The Matching package for R.''
#> ##   Journal of Statistical Software, 42(7): 1-52. 
#> ##
data(lalonde)
attach(lalonde)

#The covariates we want to match on
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)

#The covariates we want to obtain balance on
BalanceMat <- cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
                    I(re74*re75))

#
#Let's call GenMatch() to find the optimal weight to give each
#covariate in 'X' so as we have achieved balance on the covariates in

BalanceMat’. This is only an example so we want GenMatch to be quick

#so the population size has been set to be only 16 via the 'pop.size'
#option. This is *WAY* too small for actual problems.
#For details see http://sekhon.berkeley.edu/papers/MatchingJSS.pdf.
#
genout <- GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
                   pop.size=16, max.generations=10, wait.generations=1)
#> Loading required namespace: rgenoud
#> 
#> 
#> Thu Jun  1 02:50:24 2023
#> Domains:
#>  0.000000e+00   <=  X1   <=    1.000000e+03 
#>  0.000000e+00   <=  X2   <=    1.000000e+03 
#>  0.000000e+00   <=  X3   <=    1.000000e+03 
#>  0.000000e+00   <=  X4   <=    1.000000e+03 
#>  0.000000e+00   <=  X5   <=    1.000000e+03 
#>  0.000000e+00   <=  X6   <=    1.000000e+03 
#>  0.000000e+00   <=  X7   <=    1.000000e+03 
#>  0.000000e+00   <=  X8   <=    1.000000e+03 
#>  0.000000e+00   <=  X9   <=    1.000000e+03 
#>  0.000000e+00   <=  X10  <=    1.000000e+03 
#> 
#> Data Type: Floating Point
#> Operators (code number, name, population) 
#>  (1) Cloning...........................  1
#>  (2) Uniform Mutation..................  2
#>  (3) Boundary Mutation.................  2
#>  (4) Non-Uniform Mutation..............  2
#>  (5) Polytope Crossover................  2
#>  (6) Simple Crossover..................  2
#>  (7) Whole Non-Uniform Mutation........  2
#>  (8) Heuristic Crossover...............  2
#>  (9) Local-Minimum Crossover...........  0
#> 
#> SOFT Maximum Number of Generations: 10
#> Maximum Nonchanging Generations: 1
#> Population size       : 16
#> Convergence Tolerance: 1.000000e-03
#> 
#> Not Using the BFGS Derivative Based Optimizer on the Best Individual Each Generation.
#> Not Checking Gradients before Stopping.
#> Using Out of Bounds Individuals.
#> 
#> Maximization Problem.
#> GENERATION: 0 (initializing the population)
#> Lexical Fit..... 1.965387e-01  1.965387e-01  2.567638e-01  2.567638e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  4.790353e-01  5.357585e-01  5.638495e-01  5.638495e-01  7.928051e-01  8.747544e-01  9.005085e-01  9.726726e-01  9.757135e-01  9.972415e-01  9.972415e-01  9.999977e-01  1.000000e+00  1.000000e+00  
#> #unique......... 16, #Total UniqueCount: 16
#> var 1:
#> best............ 2.831305e+02
#> mean............ 4.339086e+02
#> variance........ 9.205897e+04
#> var 2:
#> best............ 4.582351e+01
#> mean............ 4.808526e+02
#> variance........ 9.951977e+04
#> var 3:
#> best............ 6.079865e+01
#> mean............ 4.581846e+02
#> variance........ 7.144076e+04
#> var 4:
#> best............ 1.419532e+02
#> mean............ 4.329569e+02
#> variance........ 6.749356e+04
#> var 5:
#> best............ 6.269561e+01
#> mean............ 5.433120e+02
#> variance........ 1.007233e+05
#> var 6:
#> best............ 4.022851e+02
#> mean............ 3.998682e+02
#> variance........ 6.291170e+04
#> var 7:
#> best............ 3.013591e+01
#> mean............ 4.630557e+02
#> variance........ 9.051718e+04
#> var 8:
#> best............ 3.071364e+02
#> mean............ 4.540889e+02
#> variance........ 8.608832e+04
#> var 9:
#> best............ 1.045635e+02
#> mean............ 2.749121e+02
#> variance........ 6.837157e+04
#> var 10:
#> best............ 9.450274e+01
#> mean............ 6.127432e+02
#> variance........ 1.050091e+05
#> 
#> GENERATION: 1
#> Lexical Fit..... 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> #unique......... 12, #Total UniqueCount: 28
#> var 1:
#> best............ 3.299557e+02
#> mean............ 2.640421e+02
#> variance........ 3.959094e+04
#> var 2:
#> best............ 4.294913e+01
#> mean............ 2.157304e+02
#> variance........ 9.514945e+04
#> var 3:
#> best............ 6.079865e+01
#> mean............ 2.973923e+02
#> variance........ 6.926883e+04
#> var 4:
#> best............ 1.419532e+02
#> mean............ 3.992844e+02
#> variance........ 6.263337e+04
#> var 5:
#> best............ 6.269561e+01
#> mean............ 3.973911e+02
#> variance........ 1.003613e+05
#> var 6:
#> best............ 5.576743e+02
#> mean............ 3.694883e+02
#> variance........ 2.014507e+04
#> var 7:
#> best............ 3.013591e+01
#> mean............ 3.757987e+02
#> variance........ 8.645335e+04
#> var 8:
#> best............ 4.150622e+02
#> mean............ 3.565758e+02
#> variance........ 2.384791e+04
#> var 9:
#> best............ 1.045635e+02
#> mean............ 2.224873e+02
#> variance........ 3.323147e+04
#> var 10:
#> best............ 3.818596e+02
#> mean............ 6.225901e+02
#> variance........ 1.460706e+05
#> 
#> GENERATION: 2
#> Lexical Fit..... 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> #unique......... 12, #Total UniqueCount: 40
#> var 1:
#> best............ 3.299557e+02
#> mean............ 3.560685e+02
#> variance........ 1.452756e+04
#> var 2:
#> best............ 4.294913e+01
#> mean............ 4.720064e+01
#> variance........ 9.665896e+02
#> var 3:
#> best............ 6.079865e+01
#> mean............ 1.694049e+02
#> variance........ 5.988433e+04
#> var 4:
#> best............ 1.419532e+02
#> mean............ 2.082209e+02
#> variance........ 2.321037e+04
#> var 5:
#> best............ 6.269561e+01
#> mean............ 2.029439e+02
#> variance........ 7.731470e+04
#> var 6:
#> best............ 5.576743e+02
#> mean............ 4.354230e+02
#> variance........ 7.085862e+03
#> var 7:
#> best............ 3.013591e+01
#> mean............ 1.607625e+02
#> variance........ 6.746991e+04
#> var 8:
#> best............ 4.150622e+02
#> mean............ 3.494544e+02
#> variance........ 2.621848e+03
#> var 9:
#> best............ 1.045635e+02
#> mean............ 1.705721e+02
#> variance........ 2.688426e+04
#> var 10:
#> best............ 3.818596e+02
#> mean............ 2.915842e+02
#> variance........ 7.914410e+04
#> 
#> GENERATION: 3
#> Lexical Fit..... 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> #unique......... 12, #Total UniqueCount: 52
#> var 1:
#> best............ 3.299557e+02
#> mean............ 3.156883e+02
#> variance........ 6.347119e+02
#> var 2:
#> best............ 4.294913e+01
#> mean............ 5.417629e+01
#> variance........ 1.679504e+03
#> var 3:
#> best............ 6.079865e+01
#> mean............ 6.985327e+01
#> variance........ 4.963423e+02
#> var 4:
#> best............ 1.419532e+02
#> mean............ 1.435664e+02
#> variance........ 6.771293e+02
#> var 5:
#> best............ 6.269561e+01
#> mean............ 6.184391e+01
#> variance........ 2.193731e+00
#> var 6:
#> best............ 5.576743e+02
#> mean............ 4.942367e+02
#> variance........ 7.572460e+03
#> var 7:
#> best............ 3.013591e+01
#> mean............ 5.851087e+01
#> variance........ 8.565154e+03
#> var 8:
#> best............ 4.150622e+02
#> mean............ 3.686187e+02
#> variance........ 3.207996e+03
#> var 9:
#> best............ 1.045635e+02
#> mean............ 1.037300e+02
#> variance........ 3.372995e+00
#> var 10:
#> best............ 3.818596e+02
#> mean............ 2.483674e+02
#> variance........ 2.473883e+04
#> 
#> 'wait.generations' limit reached.
#> No significant improvement in 1 generations.
#> 
#> Solution Lexical Fitness Value:
#> 3.173114e-01  3.173114e-01  3.173114e-01  3.173114e-01  3.711480e-01  3.711480e-01  4.054626e-01  4.054626e-01  4.598117e-01  4.796243e-01  4.796243e-01  5.426673e-01  6.956137e-01  7.353599e-01  8.254712e-01  9.304397e-01  9.758377e-01  9.972415e-01  9.991359e-01  9.999999e-01  1.000000e+00  1.000000e+00  
#> 
#> Parameters at the Solution:
#> 
#>  X[ 1] : 3.299557e+02
#>  X[ 2] : 4.294913e+01
#>  X[ 3] : 6.079865e+01
#>  X[ 4] : 1.419532e+02
#>  X[ 5] : 6.269561e+01
#>  X[ 6] : 5.576743e+02
#>  X[ 7] : 3.013591e+01
#>  X[ 8] : 4.150622e+02
#>  X[ 9] : 1.045635e+02
#>  X[10] : 3.818596e+02
#> 
#> Solution Found Generation 1
#> Number of Generations Run 3
#> 
#> Thu Jun  1 02:50:24 2023
#> Total run time : 0 hours 0 minutes and 0 seconds

#The outcome variable
Y=re78/1000

#
# Now that GenMatch() has found the optimal weights, let's estimate
# our causal effect of interest using those weights
#
mout <- Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
#> 
#> Estimate...  1.7968 
#> AI SE......  0.72812 
#> T-stat.....  2.4678 
#> p.val......  0.013596 
#> 
#> Original number of observations..............  445 
#> Original number of treated obs...............  185 
#> Matched number of observations...............  445 
#> Matched number of observations  (unweighted).  614

#                        
#Let's determine if balance has actually been obtained on the variables of interest
#                        
mb <- MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+
                     re75+ re74+ I(re74*re75),
                   match.out=mout, nboots=500)
#> 
#> ***** (V1) age *****
#>                        Before Matching        After Matching
#> mean treatment........     25.816             25.184 
#> mean control..........     25.054             25.229 
#> std mean diff.........     10.655           -0.67659 
#> 
#> mean raw eQQ diff.....    0.94054            0.35342 
#> med  raw eQQ diff.....          1                  0 
#> max  raw eQQ diff.....          7                  8 
#> 
#> mean eCDF diff........   0.025364          0.0096762 
#> med  eCDF diff........   0.022193          0.0065147 
#> max  eCDF diff........   0.065177           0.030945 
#> 
#> var ratio (Tr/Co).....     1.0278            0.90561 
#> T-test p-value........    0.26594            0.73536 
#> KS Bootstrap p-value..      0.508              0.758 
#> KS Naive p-value......     0.7481            0.93044 
#> KS Statistic..........   0.065177           0.030945 
#> 
#> 
#> ***** (V2) educ *****
#>                        Before Matching        After Matching
#> mean treatment........     10.346             10.227 
#> mean control..........     10.088             10.228 
#> std mean diff.........     12.806          -0.079008 
#> 
#> mean raw eQQ diff.....    0.40541            0.10912 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          2                  2 
#> 
#> mean eCDF diff........   0.028698          0.0077943 
#> med  eCDF diff........   0.012682          0.0040717 
#> max  eCDF diff........    0.12651           0.035831 
#> 
#> var ratio (Tr/Co).....     1.5513             1.0364 
#> T-test p-value........    0.15017            0.97584 
#> KS Bootstrap p-value..      0.014               0.38 
#> KS Naive p-value......   0.062873            0.82547 
#> KS Statistic..........    0.12651           0.035831 
#> 
#> 
#> ***** (V3) black *****
#>                        Before Matching        After Matching
#> mean treatment........    0.84324             0.8382 
#> mean control..........    0.82692             0.8427 
#> std mean diff.........     4.4767             -1.219 
#> 
#> mean raw eQQ diff.....   0.016216          0.0032573 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> 
#> mean eCDF diff........  0.0081601          0.0016287 
#> med  eCDF diff........  0.0081601          0.0016287 
#> max  eCDF diff........    0.01632          0.0032573 
#> 
#> var ratio (Tr/Co).....    0.92503             1.0231 
#> T-test p-value........    0.64736            0.47962 
#> 
#> 
#> ***** (V4) hisp *****
#>                        Before Matching        After Matching
#> mean treatment........   0.059459           0.085393 
#> mean control..........    0.10769            0.08764 
#> std mean diff.........    -20.341            -0.8032 
#> 
#> mean raw eQQ diff.....   0.048649          0.0016287 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> 
#> mean eCDF diff........   0.024116         0.00081433 
#> med  eCDF diff........   0.024116         0.00081433 
#> max  eCDF diff........   0.048233          0.0016287 
#> 
#> var ratio (Tr/Co).....    0.58288            0.97676 
#> T-test p-value........   0.064043            0.31731 
#> 
#> 
#> ***** (V5) married *****
#>                        Before Matching        After Matching
#> mean treatment........    0.18919            0.16404 
#> mean control..........    0.15385            0.15506 
#> std mean diff.........     8.9995             2.4246 
#> 
#> mean raw eQQ diff.....   0.037838          0.0065147 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> 
#> mean eCDF diff........   0.017672          0.0032573 
#> med  eCDF diff........   0.017672          0.0032573 
#> max  eCDF diff........   0.035343          0.0065147 
#> 
#> var ratio (Tr/Co).....     1.1802             1.0467 
#> T-test p-value........    0.33425            0.37115 
#> 
#> 
#> ***** (V6) nodegr *****
#>                        Before Matching        After Matching
#> mean treatment........    0.70811            0.78202 
#> mean control..........    0.83462            0.78202 
#> std mean diff.........    -27.751                  0 
#> 
#> mean raw eQQ diff.....    0.12432                  0 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  0 
#> 
#> mean eCDF diff........   0.063254                  0 
#> med  eCDF diff........   0.063254                  0 
#> max  eCDF diff........    0.12651                  0 
#> 
#> var ratio (Tr/Co).....     1.4998                  1 
#> T-test p-value........  0.0020368                  1 
#> 
#> 
#> ***** (V7) u74 *****
#>                        Before Matching        After Matching
#> mean treatment........    0.70811            0.74157 
#> mean control..........       0.75            0.73483 
#> std mean diff.........    -9.1895             1.5382 
#> 
#> mean raw eQQ diff.....   0.037838          0.0016287 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> 
#> mean eCDF diff........   0.020946         0.00081433 
#> med  eCDF diff........   0.020946         0.00081433 
#> max  eCDF diff........   0.041892          0.0016287 
#> 
#> var ratio (Tr/Co).....     1.1041            0.98352 
#> T-test p-value........    0.33033            0.40546 
#> 
#> 
#> ***** (V8) u75 *****
#>                        Before Matching        After Matching
#> mean treatment........        0.6            0.64719 
#> mean control..........    0.68462            0.64944 
#> std mean diff.........    -17.225           -0.46975 
#> 
#> mean raw eQQ diff.....   0.081081          0.0016287 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....          1                  1 
#> 
#> mean eCDF diff........   0.042308         0.00081433 
#> med  eCDF diff........   0.042308         0.00081433 
#> max  eCDF diff........   0.084615          0.0016287 
#> 
#> var ratio (Tr/Co).....     1.1133             1.0029 
#> T-test p-value........   0.068031            0.31731 
#> 
#> 
#> ***** (V9) re75 *****
#>                        Before Matching        After Matching
#> mean treatment........     1532.1             1264.4 
#> mean control..........     1266.9             1323.9 
#> std mean diff.........     8.2363            -2.0509 
#> 
#> mean raw eQQ diff.....     367.61             114.78 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....     2110.2             8195.6 
#> 
#> mean eCDF diff........   0.050834          0.0067367 
#> med  eCDF diff........   0.061954          0.0065147 
#> max  eCDF diff........    0.10748           0.022801 
#> 
#> var ratio (Tr/Co).....     1.0763             1.0118 
#> T-test p-value........    0.38527            0.45981 
#> KS Bootstrap p-value..      0.046              0.782 
#> KS Naive p-value......    0.16449            0.99724 
#> KS Statistic..........    0.10748           0.022801 
#> 
#> 
#> ***** (V10) re74 *****
#>                        Before Matching        After Matching
#> mean treatment........     2095.6             1961.2 
#> mean control..........       2107             1991.4 
#> std mean diff.........   -0.23437           -0.62292 
#> 
#> mean raw eQQ diff.....     487.98             172.06 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....       8413             7870.3 
#> 
#> mean eCDF diff........   0.019223          0.0057796 
#> med  eCDF diff........     0.0158           0.004886 
#> max  eCDF diff........   0.047089           0.021173 
#> 
#> var ratio (Tr/Co).....     0.7381            0.89669 
#> T-test p-value........    0.98186            0.69561 
#> KS Bootstrap p-value..      0.604              0.706 
#> KS Naive p-value......    0.97023            0.99914 
#> KS Statistic..........   0.047089           0.021173 
#> 
#> 
#> ***** (V11) I(re74 * re75) *****
#>                        Before Matching        After Matching
#> mean treatment........   13118591           11633548 
#> mean control..........   14530303           12543909 
#> std mean diff.........    -2.7799            -2.0317 
#> 
#> mean raw eQQ diff.....    3278733            1703799 
#> med  raw eQQ diff.....          0                  0 
#> max  raw eQQ diff.....  188160151          188160151 
#> 
#> mean eCDF diff........   0.022723           0.004617 
#> med  eCDF diff........   0.014449          0.0032573 
#> max  eCDF diff........   0.061019           0.014658 
#> 
#> var ratio (Tr/Co).....    0.69439             0.7597 
#> T-test p-value........    0.79058            0.54267 
#> KS Bootstrap p-value..      0.322              0.928 
#> KS Naive p-value......    0.81575                  1 
#> KS Statistic..........   0.061019           0.014658 
#> 
#> 
#> Before Matching Minimum p.value: 0.0020368 
#> Variable Name(s): nodegr  Number(s): 6 
#> 
#> After Matching Minimum p.value: 0.31731 
#> Variable Name(s): hisp  Number(s): 4

^{Created on 2023-06-01 with reprex v2.0.2}

lorenzo_il_magnifico · June 1, 2023, 11:49am

Hi @technocrat ,
Thank you for your response.

Yes, i apologize for the lack of a data example - I'm rather new here!

Treated firms have ETS == 1 and potential controls have ETS == 0. In the listed example, row 8 is a treated unit.

With genmatch i find potential matches by giving weights to covariates over the full sample. Within these potential matches, i want to match exactly on: : b_pat9903, bvd_sector_num & last_available_year. For last_avail_year i want to allow that a potential control units varies with +-1 from the treated. I.e., potential control units that isn't similar to any treated unit, on the 3 variables, should be dropped, and treated units where no controls are similar on the 3 variables should be dropped.

In the listed example, obs. 3 and 8 would be matched since they are equal on b_pat9903 and bvd_sector_num, and within +-1 on last_avail_year. Every other obs. should be dropped.

I hope this properly explains my problem, and that the reprex provided is sufficient - if not please let me know.

Once again, thank you for your time

## Dataex
head(AT, 10)[, c('bvd_sector_num', 'incorporation_year', 'revenue_last', 'ETS', 'last_avail_year', 'green_pat9903', 'b_green_pat9903', 'pat9903', 'b_pat9903')]
datapasta::df_paste(head(AT, 10)[, c('bvd_sector_num', 'incorporation_year', 'revenue_last', 'ETS', 'last_avail_year', 'green_pat9903', 'b_green_pat9903', 'pat9903', 'b_pat9903')]) 
data.frame(
      bvd_sector_num = c(13, 28, 21, 15, 4, 15, 1, 21, 28, 25),
  incorporation_year = c(1939,1989,1988,1973,
                         1970,1979,1996,1985,1989,1992),
        revenue_last = c(160247.04721,7480.82644,
                         1219,2114709.63818,2479725.69383,271142.60536,457,
                         496.5,2543.55,2890),
                 ETS = c(0, 0, 0, 0, 0, 0, 0, 1, 0, 0),
     last_avail_year = c(2021,2021,2021,2021,
                         2021,2021,2023,2022,2001,2020),
       green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
     b_green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             pat9903 = c(0, 0, 0, 35, 0, 28, 0, 0, 0, 0),
           b_pat9903 = c(0, 0, 0, 1, 0, 1, 0, 0, 0, 0)
)
#>    bvd_sector_num incorporation_year revenue_last ETS last_avail_year
#> 1              13               1939   160247.047   0            2021
#> 2              28               1989     7480.826   0            2021
#> 3              21               1988     1219.000   0            2021
#> 4              15               1973  2114709.638   0            2021
#> 5               4               1970  2479725.694   0            2021
#> 6              15               1979   271142.605   0            2021
#> 7               1               1996      457.000   0            2023
#> 8              21               1985      496.500   1            2022
#> 9              28               1989     2543.550   0            2001
#> 10             25               1992     2890.000   0            2020
#>    green_pat9903 b_green_pat9903 pat9903 b_pat9903
#> 1              0               0       0         0
#> 2              0               0       0         0
#> 3              0               0       0         0
#> 4              0               0      35         1
#> 5              0               0       0         0
#> 6              0               0      28         1
#> 7              0               0       0         0
#> 8              0               0       0         0
#> 9              0               0       0         0
#> 10             0               0       0         0

# Code 
zs <- (AT$ETS)
Y <- (AT$employees0104)
Xs <- data.frame(AT$bvd_sector_num, AT$last_avail_year, AT$incorporation_year, AT$revenue_last, AT$green_pat9903, AT$b_green_pat9903, AT$pat9903, AT$b_pat9903)
balance_matrix <- Xs 
gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals")
exact <- rep(FALSE, length(colnames(Xs)))
exact[grep("b_pat9903", colnames(Xs))] <- TRUE 
exact[grep("bvd_sector_num", colnames(Xs))] <- TRUE
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1
m <- Match(Y=Y, Tr=zs, X=Xs, Weight.matrix=gen, exact = exact)

lorenzo_il_magnifico · June 1, 2023, 3:29pm

Updated reprex to include Y in data.frame

## Dataex
data.frame(
      random_numbers = c(0.409526376752183,
                         0.823926829034463,0.244711946463212,0.782042996725067,
                         0.602003115927801,0.795706039993092,0.976643902249634,
                         0.777952267322689,0.122737989528105,0.959863299271092),
      bvd_sector_num = c("13","28","21","15","4",
                         "15","1","21","28","25"),
  incorporation_year = c(1939,1989,1988,1973,
                         1970,1979,1996,1985,1989,1992),
        revenue_last = c(160247.04721,7480.82644,
                         1219,2114709.63818,2479725.69383,271142.60536,457,
                         496.5,2543.55,2890),
                 ETS = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
     last_avail_year = c(2021,2021,2021,2021,
                         2021,2021,2023,2022,2001,2020),
       green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
     b_green_pat9903 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
             pat9903 = c(0, 0, 0, 35, 0, 28, 0, 0, 0, 0),
           b_pat9903 = c(0, 0, 0, 1, 0, 1, 0, 0, 0, 0)
)
#>    random_numbers bvd_sector_num incorporation_year revenue_last ETS
#> 1       0.4095264             13               1939   160247.047   0
#> 2       0.8239268             28               1989     7480.826   0
#> 3       0.2447119             21               1988     1219.000   0
#> 4       0.7820430             15               1973  2114709.638   0
#> 5       0.6020031              4               1970  2479725.694   0
#> 6       0.7957060             15               1979   271142.605   0
#> 7       0.9766439              1               1996      457.000   0
#> 8       0.7779523             21               1985      496.500   0
#> 9       0.1227380             28               1989     2543.550   0
#> 10      0.9598633             25               1992     2890.000   0
#>    last_avail_year green_pat9903 b_green_pat9903 pat9903 b_pat9903
#> 1             2021             0               0       0         0
#> 2             2021             0               0       0         0
#> 3             2013             0               0       0         0
#> 4             2021             0               0      35         1
#> 5             2021             0               0       0         0
#> 6             2021             0               0      28         1
#> 7             2023             0               0       0         0
#> 8             2022             0               0       0         0
#> 9             2001             0               0       0         0
#> 10            2020             0               0       0         0

# Code 
zs <- (AT$ETS)
Y <- (AT$random_numbers)
Xs <- data.frame(AT$bvd_sector_num, AT$last_avail_year, AT$incorporation_year, AT$revenue_last, AT$green_pat9903, AT$pat9903, AT$b_pat9903)
balance_matrix <- as.matrix(cbind(Xs, I(AT$revenue_last^2), I(AT$green_pat9903*AT$incorporation_year), AT$b_green_pat9903))
gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals", caliper=c(100,100,0.1,100,100,0.1,0.1))
exact <- rep(FALSE, length(colnames(Xs)))
exact[grep("b_pat9903", colnames(Xs))] <- TRUE 
exact[grep("bvd_sector_num", colnames(Xs))] <- TRUE
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1
m <- Match(Y=Y, Tr=zs, X=Xs, Weight.matrix=gen, exact = exact)

technocrat · June 2, 2023, 2:20am

Are you getting the following error?

> gen <- GenMatch(Tr=zs, X=Xs, BalanceMatrix=balance_matrix, pop.size=500, fit.func="pvals", caliper=c(100,100,0.1,100,100,0.1,0.1))
Loading required namespace: rgenoud
Error in GenMatch(Tr = zs, X = Xs, BalanceMatrix = balance_matrix, pop.size = 500,  : 
  Treatment indicator ('Tr') must contain both treatment and control observations
>

technocrat · June 2, 2023, 7:52am

Use the reprex() function (most conveniently through the RStudio addin menu after installing the package. It will avoid problems arising from running a code example in a "used" environment with objects in name space, such as

b_pat9903

which would only be there if the unnamed data frame had been attach()ed. and would catch the unexpected symbol (the space)

What I could see is that bvd_sector_num is typeof() character, and that coerces

balance_matrix <- as.matrix(cbind(Xs, I(AT$revenue_last^2), I(AT$green_pat9903 * AT$incorporation_year), AT$b_green_pat9903))

to character also, which ruins it as the Balance.Matrix argument to GenMatch.

lorenzo_il_magnifico · June 2, 2023, 3:37pm

typeof(bvd_sector_num) = double, so I think that's just because i messed up the reprex again.

I should clarify:
My GenMatch works properly, although not on reprex provided, when removing:
exact[grep("last_avail_year", colnames(Xs))] <- abs(Xs$last_avail_year - AT$last_avail_year) <= 1

So the code listed here is my only problem. I can't figure out how to allow potential controls to vary with +-1 from the treated on last_avail_year.

nirgrahamuk · June 2, 2023, 4:04pm

(exact <- c(FALSE,FALSE))

exact[1] <- TRUE
exact

abs(c(1,1) -c(2,2)) < 3 

exact[2] <- abs(c(1,1) -c(2,2)) < 3

you have the following problem.
you wish to set one entry of exact to either true of false
you calculate a larger number of true/false than 1 and try to jam it into there.

things to consider :

Either, you need exact to be larger in order to accommodate the greater volume of true/false conditions you want to store there, or else you need to summarise the true and false conditions to evaluate to a single true or false.

lorenzo_il_magnifico · June 3, 2023, 7:11am

This makes a lot of sense.

I rewrote it to:
exact[grep("last_avail_year", colnames(Xs1))] <- any(abs(Xs1$last_avail_year - testx$last_avail_year) <= 1)

Which solved my problems. Thank you for the help

system · June 10, 2023, 7:12am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.