 # Counterintuitive marginal effects plots (logit)

Hi all,

I have a general question I hope someone can help me out with. I thought I understood what marginal effects plots display, but I am puzzled by the results I am getting.

I am comparing two nested binomial logit models, where one includes a squared term of one of the variables (polity20). Comparing model1 (no squared term) and model2 (with squared term), the effects of several variables appear slightly bigger and more significant in model2 (matches, business_case1 and viol. detention1 - see outputs below).

However, when plotting marginal effects, from what I understand, the effects of these variables appear smaller and less significant in model 2: predprob_allsqu.pdf (7.4 KB)
than in model 1: predprob_all.pdf (7.4 KB).

I can't figure out why that would be the case, so there's something I've clearly not understood about this. I'd appreciate any help!

Thank you,
Janix

Below are a) the model outputs and b) the function I am using to plot the marginal effects:

Model 1:

Call:
glm(formula = posdev ~ polity20 + SFI + upr + aidperc + govreply +
matches + business_case + viol.detention + viol.torture +
viol.convic + viol.disciplin + viol.adminhar + issue.minor +
gender.fti, family = binomial(link = "logit"), data = FINAL.reg_all)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-1.84455  -0.70277  -0.48751  -0.04711   2.60870

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)     -1.299822   0.548505  -2.370 0.017800 *
polity20        -0.005792   0.023024  -0.252 0.801385
SFI              0.016469   0.032405   0.508 0.611286
upr1             0.118892   0.337219   0.353 0.724414
aidperc          0.002852   0.036759   0.078 0.938168
govreply1       -0.197003   0.259985  -0.758 0.448600
matches         -0.616109   0.136937  -4.499 6.82e-06 ***
business_case1  -1.404013   0.575701  -2.439 0.014737 *
viol.detention1  1.023834   0.305487   3.351 0.000804 ***
viol.torture1   -1.195165   0.372627  -3.207 0.001339 **
viol.convic1     0.969627   0.432383   2.243 0.024928 *
viol.disciplin1  2.702470   0.854401   3.163 0.001562 **
viol.adminhar1  -0.086265   0.703186  -0.123 0.902363
issue.minor1    -0.350260   0.311041  -1.126 0.260128
gender.fti1      0.275293   0.276196   0.997 0.318896
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 504.28  on 460  degrees of freedom
Residual deviance: 422.70  on 446  degrees of freedom
AIC: 452.7

Number of Fisher Scoring iterations: 6


Model 2:


Call:
glm(formula = posdev ~ polity20 + I(polity20^2) + SFI + upr +
aidperc + govreply + matches + business_case + viol.detention +
viol.torture + viol.convic + viol.disciplin + viol.adminhar +
issue.minor + gender.fti, family = binomial(link = "logit"),
data = FINAL.reg_all)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.9606  -0.7041  -0.4755  -0.0351   2.7222

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)     -0.52176    0.62268  -0.838 0.402070
polity20        -0.33215    0.12554  -2.646 0.008148 **
I(polity20^2)    0.01609    0.00607   2.650 0.008048 **
SFI              0.02726    0.03290   0.829 0.407360
upr1             0.06896    0.33958   0.203 0.839067
aidperc          0.03174    0.03830   0.829 0.407191
govreply1       -0.07600    0.26554  -0.286 0.774703
matches         -0.65265    0.13909  -4.692  2.7e-06 ***
business_case1  -1.68219    0.61241  -2.747 0.006017 **
viol.detention1  1.08664    0.31213   3.481 0.000499 ***
viol.torture1   -1.17706    0.37542  -3.135 0.001717 **
viol.convic1     0.90024    0.44022   2.045 0.040858 *
viol.disciplin1  2.50711    0.87444   2.867 0.004142 **
viol.adminhar1   0.01808    0.70736   0.026 0.979610
issue.minor1    -0.54827    0.32487  -1.688 0.091484 .
gender.fti1      0.32704    0.28008   1.168 0.242933
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 504.28  on 460  degrees of freedom
Residual deviance: 415.48  on 445  degrees of freedom
AIC: 447.48

Number of Fisher Scoring iterations: 6

predprobbinom <- function(model,xvar,xlab){
plot_model(model, terms = list(paste0(xvar)), type = "eff", title="") +
scale_y_continuous(labels = scales::label_percent(accuracy = 1), limits=c(0,1)) +
labs(x = paste0(xlab),
y = "Predicted probability\nof positive development") +
theme_bw()
}


This is kinda hard to debone in the abstract, without the FINAL.reg_all object or a representative subset added to make a complete reprex. As it is, all I can ask why are so many covariates retained at a p-value > 0.05 (if you are using the conventional laugh test \alpha)? Have you tested GOF?

Hi @technocrat,

Thanks for your reply. Coming from the social sciences, variables that are theoretically relevant are typically retained irrespective of their significance. GOF for the two models are e.g. Cox-Snell Pseudo-R² at 0.16 and 0.18 respectively, which from what I read seems pretty good for this type of study.

I understand that a complete reprex is generally desirable, but I was kind of hoping that this question could be answered in the abstract, as I'm interested in the general logic behind it:

• I expect marginal effect plots to use the model parameters and calculate predicted values for Y when holding all variables constant except the X of interest. So from the model outputs above, I would for example expect that the binary Xs whose effect is larger in model 2 would also show a larger difference between Y(X=0) and Y(X=1) in the margin plots.
• The same goes for significance: I would expect that if the significance level is higher in model 2, the confidence intervals in the margin plots would not (or less) overlap between Y(X=0) and Y(X=1) compared to model 1.

However, both these expectations are not met, which for me is a sign that I don't really understand what the marginal effect plots are displaying - rather than this being an issue with my particular dataset. If you disagree, I am happy to try and supply a reprex to explore this further.

Best,
Janix

I would consider removing the raw variable if you are adding its square?

Hi @janix, I was wondering if you could say a little more about what it means when you say the marginal effects are smaller and less significant in model2. It seems the word 'significant' may mean different things in the two contexts.

1 Like

Sure, gladly. Let's take the variable business_case1 (it's a binary independent variable). In model 1, its estimate is -1.404, and its p-value at 0.015. In model 2, the estimate is -1.682, it's p-value is 0.006. Thus both effect size and significance level for this variable are higher in model 2.

This is what I see in its marginal effect plot for model 1 and model 2:  The plots display predicted probabilities, so they are of course not directly comparable to the model parameters. But the fact that the difference between predicted prob for X=0 and X=1 is larger in model 1 and that their confidence intervals are barely overlapping - whereas they clearly overlap in plot for model 2 - seems to suggest to me that the variable's effect and significance should be larger in model 1. What is the error in my reasoning?

You might find the margins package and its vignette helpful

library("margins")
x <- lm(mpg ~ cyl * hp + wt, data = mtcars)
(m <- margins(x))
#> Average marginal effects
#> lm(formula = mpg ~ cyl * hp + wt, data = mtcars)
#>      cyl       hp    wt
#>  0.03814 -0.04632 -3.12
summary(m)
#>  factor     AME     SE       z      p   lower   upper
#>     cyl  0.0381 0.5999  0.0636 0.9493 -1.1376  1.2139
#>      hp -0.0463 0.0145 -3.1909 0.0014 -0.0748 -0.0179
#>      wt -3.1198 0.6613 -4.7176 0.0000 -4.4160 -1.8236
plot(m) Created on 2020-03-23 by the reprex package (v0.3.0)

Hi @janix: I'm not sure sure if this is helpful, and I make no claim to being correct, but here are my thoughts, and hopefully they make sense.

These estimates are for the coefficient of business_case1 in a linear model that predicts the log odds of the value of posdev. In the case of a continuous variable, this coefficient represents how much the log odds will increase when the variable increases by one unit (all others held constant); in the case of a discrete variable, this applies to the dummy variables that correspond to the discrete variables. In either case, this means that e^coefficient represents the factor by which the odds will increase when the variable increases by one unit, and this increase in the odds can be interpreted as a measure of effect size in the context of logistic regression. This would mean that the effect size corresponding to business_case1 in model 1 is larger than in the model 2 since e^-1.404 is larger than e^-1.682.

In terms of significance, the p-values are an indication of how unlikely it is that the coefficients should be zero, assuming the data was generated by a process described by the model. In the absence of further explanation, this kind of significance doesn't seem to me like it should have any relationship to the size of the errors bars in the marginal effects' plots. The former is a measure related to trying to fit a logit function to the data as a whole, whereas the second is a measure how well the function fits at a specific value. Here's an example of what I mean:

library(tidyverse)

# A single p-value measuring the contribution of 'displ' to line of best fit
mpg %>%
lm(hwy ~ displ, data = .) %>%
summary()
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)
#> (Intercept)  35.6977     0.7204   49.55   <2e-16 ***
#> displ        -3.5306     0.1945  -18.15   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# One s.e. per value, measuring confidence of prediction at that value
mpg %>%
ggplot(aes(displ, hwy)) +
geom_smooth(method = lm) Created on 2020-03-23 by the reprex package (v0.3.0)

They display the difference in probabilities associated with the two values of business_case1 -- assuming that all the variables take on the average of their values (including the dummy variables associated with the discrete variables) -- and the error bars are built from the associated (multivariate) standard errors. This contrasts with errors bars (box plots) calculated from actual univariate data from two populations, where overlap may result in a smaller effect size.

Does this make sense?

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.