 # Multiple estimates/statistics from bootstrapping with rsample?

I am using `rsample` to run some bootstraps and are trying to figure out if there is a way of having it calculate multiple statistics. The following works great if I have one statistics:

``````suppressMessages(library(tidyverse))
library(rsample)

# Bootstrapping with one statistics
set.seed(2)
bootstraps(mtcars, times = 2000, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ {
df <- analysis(.x)
tibble(
term = "ratio",
estimate  = mean((df\$carb < 3)),
std.error = NA_real_
)
}
)) %>%
int_pctl(ratio)
#> # A tibble: 1 x 6
#>   term  .lower .estimate .upper .alpha .method
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>
#> 1 ratio  0.375     0.534  0.719   0.05 percentile
``````

Created on 2020-03-18 by the reprex package (v0.2.1)

If I need two different statistics, I can do it the following way, although that means having to merge in the actual estimates afterward (not done here):

``````suppressMessages(library(tidyverse))
library(rsample)

set.seed(2)
bootstraps(mtcars, times = 2000) %>%
mutate(ratio = map(splits, ~ {
df <- analysis(.x)
tibble(
estimate_1 = mean((df\$carb < 3)),
estimate_2 = mean(df[df\$carb > 3, ]\$mpg)
)
}
)) %>%
unnest(cols = ratio) %>%
summarise(
est_1_lower = quantile(estimate_1, 0.025),
est_1_upper = quantile(estimate_1, 0.975),
est_2_lower = quantile(estimate_2, 0.025),
est_2_upper = quantile(estimate_2, 0.975)
)
#> # A tibble: 1 x 4
#>   est_1_lower est_1_upper est_2_lower est_2_upper
#>         <dbl>       <dbl>       <dbl>       <dbl>
#> 1       0.375       0.719        13.9        18.1
``````

Created on 2020-03-18 by the reprex package (v0.2.1)

The following is what I would like to be able to do, but I cannot figure out how to get `int_pctl` to accept something other than `estimate` as the variable name.

``````# This is what I would like
set.seed(2)
bootstraps(mtcars, times = 2000, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ {
df <- analysis(.x)
tibble(
term_1 = "ratio",
estimate_1  = mean((df\$carb < 3)),
std.error_1 = NA_real_,
term_2 = "mean",
estimate_2 = mean(df[df\$carb > 3, ]\$mpg),
std.error_2 = NA_real_
)
}
)) %>%
int_pctl(ratio, mean)
``````

Is it possible to get `int_pctl` to handle multiple names?

This is kind of hard to follow without a full `reprex` with all the functions defined, such as `int_pctl`.

Conceptually, however, what you should be thinking of is a function `f` that takes as its argument some object, such as a data frame and returns a result, which may be an object with multiple variables. Then you `bootstrap` that object.

The help page has

An unquoted column name or `dplyr` selector that identifies a single column in the data set that contains the individual bootstrap estimates. This can be a list column of tidy tibbles (that contains columns `term` and `estimate` ) or a simple numeric column. For t-intervals, a standard tidy column (usually called `std.err` ) is required. See the examples below.

``````suppressMessages(library(tidyverse))
library(rsample)

compute <- function(split) {
df <- analysis(split)
tibble(term = c("low carb", "high carb"),
estimate = c(mean((df\$carb < 3)), mean(df[df\$carb > 3,]\$mpg)))
}

set.seed(2)
bt <-
bootstraps(mtcars, times = 2000, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> # A tibble: 2 x 6
#>   term      .lower .estimate .upper .alpha .method
#>   <chr>      <dbl>     <dbl>  <dbl>  <dbl> <chr>
#> 1 high carb 13.9      16.0   18.1     0.05 percentile
#> 2 low carb   0.375     0.534  0.719   0.05 percentile
``````

Created on 2020-03-18 by the reprex package (v0.3.0)

2 Likes

@Max Thank you so very much! I have read that sentence way too many times, but I always focused on the word "column" and the example shows only one statistics.

I do think the help could be a little clearer. If I try to write up an example based on what you did and do a pull request would that be okay, or would it be easier if you add it directly? I could, for example, do one for the `iris` data using mean and median (silly and simple, but it would serve to illustrate).

Claus

Please submit a PR. We can always make documentation better

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.