I am using rsample
to run some bootstraps and are trying to figure out if there is a way of having it calculate multiple statistics. The following works great if I have one statistics:
suppressMessages(library(tidyverse))
library(rsample)
# Bootstrapping with one statistics
set.seed(2)
bootstraps(mtcars, times = 2000, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ {
df <- analysis(.x)
tibble(
term = "ratio",
estimate = mean((df$carb < 3)),
std.error = NA_real_
)
}
)) %>%
int_pctl(ratio)
#> # A tibble: 1 x 6
#> term .lower .estimate .upper .alpha .method
#> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 ratio 0.375 0.534 0.719 0.05 percentile
Created on 2020-03-18 by the reprex package (v0.2.1)
If I need two different statistics, I can do it the following way, although that means having to merge in the actual estimates afterward (not done here):
suppressMessages(library(tidyverse))
library(rsample)
set.seed(2)
bootstraps(mtcars, times = 2000) %>%
mutate(ratio = map(splits, ~ {
df <- analysis(.x)
tibble(
estimate_1 = mean((df$carb < 3)),
estimate_2 = mean(df[df$carb > 3, ]$mpg)
)
}
)) %>%
unnest(cols = ratio) %>%
summarise(
est_1_lower = quantile(estimate_1, 0.025),
est_1_upper = quantile(estimate_1, 0.975),
est_2_lower = quantile(estimate_2, 0.025),
est_2_upper = quantile(estimate_2, 0.975)
)
#> # A tibble: 1 x 4
#> est_1_lower est_1_upper est_2_lower est_2_upper
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.375 0.719 13.9 18.1
Created on 2020-03-18 by the reprex package (v0.2.1)
The following is what I would like to be able to do, but I cannot figure out how to get int_pctl
to accept something other than estimate
as the variable name.
# This is what I would like
set.seed(2)
bootstraps(mtcars, times = 2000, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ {
df <- analysis(.x)
tibble(
term_1 = "ratio",
estimate_1 = mean((df$carb < 3)),
std.error_1 = NA_real_,
term_2 = "mean",
estimate_2 = mean(df[df$carb > 3, ]$mpg),
std.error_2 = NA_real_
)
}
)) %>%
int_pctl(ratio, mean)
Is it possible to get int_pctl
to handle multiple names?