 # Passing many estimates/statistics to rsample

I am working on a bootstrapping set-up with many different estimates in `rsample` based on the solution provided by @Max to one of my prior questions. My basic problem is that I have so many different estimates that I cannot explicitly list them all. The term part was relatively easy, but I cannot figure out what to do with the `estimate =` part. The following is a simple illustration that works but requires explicitly listing the columns with the estimates.

``````suppressMessages(library(tidyverse))
library(rsample)
compute <- function(split, ...) {
df <- analysis(split) %>%
group_by(am) %>%
summarise(mean = mean(mpg)) %>%
pivot_wider(
names_from = am,
values_from = mean,
names_prefix = "am_"
)
tibble(term = names(df),
estimate = c(df\$am_0, df\$am_1))
}

set.seed(2)
bt <-
bootstraps(mtcars, times = 200, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `am_0`, `am_1`.
#> # A tibble: 2 x 6
#>   term  .lower .estimate .upper .alpha .method
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>
#> 1 am_0    15.6      17.1   18.8   0.05 percentile
#> 2 am_1    20.8      24.3   27.4   0.05 percentile
``````

Created on 2020-05-20 by the reprex package (v0.3.0)

Neither `estimate = df` nor `estimate = c(df)` works for providing the results columns in the `compute` function. Any suggestions for how to provide the estimates?

As it turns out, `rsample` really does not like named vectors for the estimates. So, "all" you need to do is to `unlist` the data frame with the estimates and you are good to go! Using `as.vector(df)` does not work.

``````suppressMessages(library(tidyverse))
library(rsample)
compute <- function(split, ...) {
df <- analysis(split) %>%
group_by(am) %>%
summarise(mean = mean(mpg)) %>%
pivot_wider(
names_from = am,
values_from = mean,
names_prefix = "am_"
)
tibble(
term = names(df),
estimate = unlist(df)
)
}

set.seed(2)
bt <-
bootstraps(mtcars, times = 200, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `am_0`, `am_1`.
#> # A tibble: 2 x 6
#>   term  .lower .estimate .upper .alpha .method
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>
#> 1 am_0    15.6      17.1   18.8   0.05 percentile
#> 2 am_1    20.8      24.3   27.4   0.05 percentile
``````

Created on 2020-05-20 by the reprex package (v0.3.0)

The naming thing is universal in the tidyverse; it is a new `tibble` thing

1 Like

I really do need to learn how to read and think before I code!

Since `tidyverse` automatically converts to a tibble there is no need for mucking about with pivot and everything else I was doing. Simply name the columns correctly and create an easy to read/manipulate term column and you are done.

``````suppressMessages(library(tidyverse))
library(rsample)
compute <- function(split, ...) {
analysis(split) %>%
group_by(am) %>%
summarise(estimate = mean(mpg)) %>%
mutate(
term = paste0("am_", am)
)
}

set.seed(2)
bt <-
bootstraps(mtcars, times = 200, apparent = TRUE) %>%
mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `am_0`, `am_1`.
#> # A tibble: 2 x 6
#>   term  .lower .estimate .upper .alpha .method
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>
#> 1 am_0    15.6      17.1   18.8   0.05 percentile
#> 2 am_1    20.8      24.3   27.4   0.05 percentile
``````

Created on 2020-05-21 by the reprex package (v0.3.0)

Well, at least I got to spend some quality time with the new pivot command...

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.