Zelig - error in ATT calculate

I'm trying to calculate the Average Treatment Effect on the Treated (TT) using Zelig package following the paper "MatchIt: Nonparametric Preprocessing for
Parametric Causal Inference", available in https://imai.fas.harvard.edu/research/files/matchit.pdf. However, in the last part (s.out function), when I go to calculate the ATT, show the follow error : Error in eigen (Sigma, symmetric = TRUE) : infinite or missing values in 'x'. How can I solve it?

Follow the codes below:

library(Zelig)

data(lalonde)

m.out <- matchit(treat ~ educ + age + black + hisp + married + nodegr + re74 + re75, data = lalonde, method = "nearest", ratio = 1)

z.out <- zelig (re78 ~ treat + age + educ + black + nodegr + hisp + married + re74 + re75,
                data = match.data(m.out, "control")  , model = "ls") 

x.out <- setx(z.out, data = match.data(m.out, "treat"), cond = TRUE) 

s.out <- sim(z.out, x = x.out)

Thanks so much.

Júlio

Welcome to the forum Júlio. The short answer is, remove treat from the zelig model formula. You're fitting the model only for the matched control observations, that is, those observations that, in this case, all have treat=0. As a result, treat shouldn't be an independent variable in the model. See below for more details.


I describe below how to make the ATT calculation work, but first, to make your example run, we need to make a few changes to your code: We'll need the following packages:

library(Zelig)
library(MatchIt)

We also need the correct variable names for the model specification. hisp should be hispan and nodegr should be nodegree.

Now, to address the error you're getting: In z.out you have (after making the corrections described above):

z.out <- zelig (re78 ~ treat + age + educ + black + nodegree + hispan + married + re74 + re75,
                data = match.data(m.out, "control")  , model = "ls") 

This results in the model being run only with matched data rows that have treat=0. Yet treat is also included in the model formula. Since treat has only one value, no coefficient for treat is estimated in the model. That's what is ultimately causing the error you're getting.

Here's what I get for z.out when I run your code (with the changes described at the beginning):

Coefficients: (1 not defined because of singularities)
              Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.714e+03  4.246e+03  -0.875   0.3830
treat               NA         NA      NA       NA
age         -1.791e+01  4.624e+01  -0.387   0.6991
educ         6.774e+02  2.634e+02   2.572   0.0109
black        1.444e+02  1.118e+03   0.129   0.8974
nodegree     2.435e+03  1.395e+03   1.746   0.0826
hispan       1.529e+03  1.294e+03   1.182   0.2390
married     -1.197e+03  1.203e+03  -0.995   0.3210
re74         1.574e-02  1.348e-01   0.117   0.9072
re75         4.253e-01  2.079e-01   2.046   0.0423

As described on page 12-13 of the MatchIt vignette, to calculate the Average Treatment Effect on the Treated (ATT), we fit the model just for the matched control group observations, which is what you've done. However, we need to exclude treat from the z.out model formula, since treat has only one value. If the matching procedure has controlled for selection bias, then this model gives us the counterfactual (what re78 would be for the treated group if it had not been treated). Then we apply the coefficients from this model to the matched treated observations (the matched observations for which treat=1) to get the ATT.

library(Zelig)
#> Loading required package: survival
library(MatchIt)

m.out <- matchit(treat ~ educ + age + black + hispan + married + nodegree + re74 + re75, 
                 data = lalonde, method = "nearest", ratio = 1)

z.out <- zelig(re78 ~ age + educ + black + nodegree + hispan + married + re74 + re75,
               data = match.data(m.out, "control"), model = "ls") 

x.out <- setx(z.out, data = match.data(m.out, "treat"), cond = TRUE) 

s.out <- sim(z.out, x = x.out)

m.out
#> 
#> Call: 
#> matchit(formula = treat ~ educ + age + black + hispan + married + 
#>     nodegree + re74 + re75, data = lalonde, method = "nearest", 
#>     ratio = 1)
#> 
#> Sample sizes:
#>           Control Treated
#> All           429     185
#> Matched       185     185
#> Unmatched     244       0
#> Discarded       0       0

z.out
#> Model: 
#> 
#> Call:
#> z5$zelig(formula = re78 ~ age + educ + black + nodegree + hispan + 
#>     married + re74 + re75, data = match.data(m.out, "control"))
#> 
#> Residuals:
#>    Min     1Q Median     3Q    Max 
#>  -9411  -4362  -1854   2639  17392 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -3.714e+03  4.246e+03  -0.875   0.3830
#> age         -1.791e+01  4.624e+01  -0.387   0.6991
#> educ         6.774e+02  2.634e+02   2.572   0.0109
#> black        1.444e+02  1.118e+03   0.129   0.8974
#> nodegree     2.435e+03  1.395e+03   1.746   0.0826
#> hispan       1.529e+03  1.294e+03   1.182   0.2390
#> married     -1.197e+03  1.203e+03  -0.995   0.3210
#> re74         1.574e-02  1.348e-01   0.117   0.9072
#> re75         4.253e-01  2.079e-01   2.046   0.0423
#> 
#> Residual standard error: 5910 on 176 degrees of freedom
#> Multiple R-squared:  0.09413,    Adjusted R-squared:  0.05296 
#> F-statistic: 2.286 on 8 and 176 DF,  p-value: 0.02365
#> 
#> Next step: Use 'setx' method

x.out
#> setx:
#>   (Intercept)  age educ black nodegree hispan married re74 re75
#> 1           1 25.3 10.6  0.47    0.638  0.216   0.211 2342 1615
#> 
#> Next step: Use 'sim' method

s.out
#> 
#>  sim x :
#>  -----
#> ev
#>       mean       sd      50%     2.5%    97.5%
#> 1 5437.778 425.0799 5432.914 4611.991 6227.689
#> pv
#>         mean       sd      50%      2.5%    97.5%
#> [1,] 5285.28 5799.457 5368.345 -6447.408 15900.29

Created on 2019-07-13 by the reprex package (v0.3.0)

Dear Joels,

Thank you so much for your help. Yow were very careful with me! The problem was simple. I looked for the error before, but I didn't find... :frowning_face:Just now I see!!!!:wink: However I still have a doubt: the sim command give the the values of ev and pv. I guess the ATT effect is the first difference (fd) between their means . But the command don't give me it. Do you know how to give this value? Thanks again!