standard errors using prediction() in R vs margins in Stata

Hi,

When using the prediction function to calculate predicted means (or "margins") I get the same predicted values in both R and Stata, but different standard errors (and different confidence intervals as a result). What is the reason for the difference? Am I calculating the standard errors incorrectly below with my call to summarize (se = mean(se.fitted)). Or, possibly, is Stata calculating the errors incorrectly?

Ideally, I would be able to reproduce the same results in any software package. I would greatly appreciate any advice for how to replicate the results Stata gives in R.

suppressWarnings(suppressPackageStartupMessages({
  library(tidyverse)
  library(prediction)
}))
  
# model
mod <- lm(mpg ~ cyl + vs + hp, data = mtcars)

# predicted values
pred <- prediction(mod, at = list(cyl = c(4, 6, 8)))

pred %>% 
  group_by(cyl) %>% 
  summarise(pred_mean = mean(fitted) ,
            se = mean(se.fitted),
            ci_low = pred_mean - (1.96 * se),
            ci_high = pred_mean + (1.96 * se),
            total_n = n()) %>% 
  as.data.frame() # to carry out decimals further for printing
#>   cyl pred_mean       se   ci_low  ci_high total_n
#> 1   4  25.60488 1.907449 21.86628 29.34348      32
#> 2   6  20.56328 1.415614 17.78867 23.33788      32
#> 3   8  15.52167 1.771571 12.04939 18.99395      32


# stata output, and code to replicate, below:

# library(haven)

# mtcars %>%
#   write_dta("mtcars.dta"))
# 
# use mtcars, clear
# reg mpg vs cyl hp
# margins, at(cyl = (4, 6, 8))
# 
# Predictive margins                              Number of obs     =         32
# Model VCE    : OLS
# 
# Expression   : Linear prediction, predict()
# 
# 1._at        : cyl             =           4
# 
# 2._at        : cyl             =           6
# 
# 3._at        : cyl             =           8
# 
# |            Delta-method
# |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
#_at |
# 1  |   25.60488   1.619802    15.81   0.000     22.28687     28.9229
# 2  |   20.56328   .5809866    35.39   0.000     19.37318    21.75337
# 3  |   15.52167   1.379056    11.26   0.000      12.6968    18.34654

# for comparison, ggpredict gives yet a different (though closer) confidence interval

suppressWarnings(suppressPackageStartupMessages({
  library(ggeffects)
}))

ggpredict(mod, terms =  c("cyl [4, 6, 8]"))
#> # Predicted values of mpg
#> 
#> cyl | Predicted |         95% CI
#> --------------------------------
#>   4 |     25.60 | [22.43, 28.78]
#>   6 |     20.56 | [19.42, 21.70]
#>   8 |     15.52 | [12.82, 18.22]
#> 
#> Adjusted for:
#> * vs =   0.44
#> * hp = 146.69

Created on 2022-03-14 by the reprex package (v2.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.