broom::tidy() is very slow when estimating confidence intervals

Hi all.

I am finding the tidy function from broom to be very slow when conf.int=TRUE compared to "manually" calculating the CIs (as the estimate +/- 1.96* SE).

Looking on the GitHub it appears that it is actually the use of the confint function with default method that is slow (compared to simply using the SE as I mention above). When including many exposure (dozens or hundreds) it means tidy is very slow but produces almost exactly the same CIs as the simple manual method. Is there a more elegant/quick/tidy way to do this?

Examples below for a glm with single exposure, and another with 10.

Thanks,
Luke

library(tidyverse)
library(broom)
library(rbenchmark)

set.seed(123)

## create continuous exposure and outcome
x = runif(10000, 1, 5)
y = x + rnorm(10000)

## create "quantiles" version of `x`
x_qt10 = as.factor(findInterval(x, quantile(x, probs = seq(0.1, 1, 0.1), na.rm=TRUE), rightmost.closed=TRUE) + 1)

## combine
df = tibble(x,y,x_qt10)

## model of continuous x on y
fit_lm1 = glm(y ~ x, data = df)
benchmark("tidy CIs"    = { fit_lm1_tidy1 = tidy(fit_lm1, conf.int=TRUE) },
          "manual CIs"  = { fit_lm1_tidy2 = tidy(fit_lm1) %>% mutate(lci=estimate-(1.96*std.error), uci=estimate+(1.96*std.error)) })
#>         test replications elapsed relative user.self sys.self user.child sys.child
#> 2 manual CIs          100    0.39    1.000      0.39     0.00         NA        NA
#> 1   tidy CIs          100   10.44   26.769     10.41     0.03         NA        NA

lm1_cis = data.frame(tidy_lci=fit_lm1_tidy1$conf.low, tidy_uci=fit_lm1_tidy1$conf.high,
                     manual_lci=fit_lm1_tidy2$lci, manual_uci=fit_lm1_tidy2$uci)
lm1_cis
#>      tidy_lci   tidy_uci  manual_lci manual_uci
#> 1 -0.02244986 0.08722226 -0.02245087 0.08722327
#> 2  0.97271488 1.00696027  0.97271456 1.00696058

## model of 10 quantiles on y
fit_lm2 = glm(y ~ x_qt10, data = df)
benchmark("tidy CIs"    = { fit_lm2_tidy1 = tidy(fit_lm2, conf.int=TRUE) },
          "manual CIs"  = { fit_lm2_tidy2 = tidy(fit_lm2) %>% mutate(lci=estimate-(1.96*std.error), uci=estimate+(1.96*std.error)) })
#>         test replications elapsed relative user.self sys.self user.child sys.child
#> 2 manual CIs          100    0.41    1.000      0.41     0.00         NA        NA
#> 1   tidy CIs          100  106.22  259.073    105.71     0.25         NA        NA

lm2_cis = data.frame(tidy_lci=fit_lm2_tidy1$conf.low, tidy_uci=fit_lm2_tidy1$conf.high,
                     manual_lci=fit_lm2_tidy2$lci, manual_uci=fit_lm2_tidy2$uci)
lm2_cis
#>     tidy_lci tidy_uci manual_lci manual_uci
#> 1  1.1795716 1.304530  1.1795704  1.3045313
#> 2  0.2768399 0.453558  0.2768383  0.4535597
#> 3  0.6776729 0.854391  0.6776713  0.8543926
#> 4  1.0788242 1.255542  1.0788225  1.2555439
#> 5  1.4379012 1.614619  1.4378996  1.6146209
#> 6  1.8634412 2.040159  1.8634396  2.0401610
#> 7  2.2922514 2.468969  2.2922497  2.4689711
#> 8  2.5767003 2.753418  2.5766987  2.7534200
#> 9  3.0297996 3.206518  3.0297980  3.2065193
#> 10 3.4727892 3.649507  3.4727876  3.6495090

My results on an M1 are similarly ugly

       test replications elapsed relative user.self
2 manual CIs          100   0.396    1.000     0.394
1   tidy CIs          100  24.716   62.414    24.637
  sys.self user.child sys.child
2    0.001          0         0
1    0.083          0         0



        test replications elapsed relative user.self
2 manual CIs          100   0.418    1.000      0.41
1   tidy CIs          100 163.035  390.036    162.38
  sys.self user.child sys.child
2    0.007          0         0
1    0.701          0         0

For a way to do this in tidy, I'd ask on the github issues page. If I were doing something similar, however, I'd roll my own display from {base} using {gt} or similar.

Thanks @technocrat. Will post as a "Feature Request" as it is less an issue, than a desire to have the CIs estimated differently (options for the user to choose which they need would be ideal)

For the record here, I did post as an issue and received a useful reply from Simon Couch: Option to calculate confidence intervals as 1.96*SE · Issue #1125 · tidymodels/broom · GitHub

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.