Passing many estimates/statistics to rsample

clausp · May 20, 2020, 11:20pm

I am working on a bootstrapping set-up with many different estimates in rsample based on the solution provided by @Max to one of my prior questions. My basic problem is that I have so many different estimates that I cannot explicitly list them all. The term part was relatively easy, but I cannot figure out what to do with the estimate = part. The following is a simple illustration that works but requires explicitly listing the columns with the estimates.

suppressMessages(library(tidyverse))
library(rsample)
compute <- function(split, ...) {
  df <- analysis(split) %>%   
    group_by(am) %>% 
    summarise(mean = mean(mpg)) %>% 
    pivot_wider(
      names_from = am, 
      values_from = mean,
      names_prefix = "am_"
    ) 
  tibble(term = names(df),
         estimate = c(df$am_0, df$am_1))
}

set.seed(2)
bt <-
  bootstraps(mtcars, times = 200, apparent = TRUE) %>%
  mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `am_0`, `am_1`.
#> # A tibble: 2 x 6
#>   term  .lower .estimate .upper .alpha .method   
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 am_0    15.6      17.1   18.8   0.05 percentile
#> 2 am_1    20.8      24.3   27.4   0.05 percentile

^{Created on 2020-05-20 by the reprex package (v0.3.0)}

Neither estimate = df nor estimate = c(df) works for providing the results columns in the compute function. Any suggestions for how to provide the estimates?

clausp · May 21, 2020, 12:38am

As it turns out, rsample really does not like named vectors for the estimates. So, "all" you need to do is to unlist the data frame with the estimates and you are good to go! Using as.vector(df) does not work.

suppressMessages(library(tidyverse))
library(rsample)
compute <- function(split, ...) {
  df <- analysis(split) %>%   
    group_by(am) %>% 
    summarise(mean = mean(mpg)) %>% 
    pivot_wider(
      names_from = am, 
      values_from = mean,
      names_prefix = "am_"
    ) 
  tibble(
    term = names(df),
    estimate = unlist(df)
  )
}

set.seed(2)
bt <-
  bootstraps(mtcars, times = 200, apparent = TRUE) %>%
  mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `am_0`, `am_1`.
#> # A tibble: 2 x 6
#>   term  .lower .estimate .upper .alpha .method   
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 am_0    15.6      17.1   18.8   0.05 percentile
#> 2 am_1    20.8      24.3   27.4   0.05 percentile

^{Created on 2020-05-20 by the reprex package (v0.3.0)}

Max · May 21, 2020, 1:41am

The naming thing is universal in the tidyverse; it is a new tibble thing

clausp · May 21, 2020, 2:42pm

I really do need to learn how to read and think before I code!

Since tidyverse automatically converts to a tibble there is no need for mucking about with pivot and everything else I was doing. Simply name the columns correctly and create an easy to read/manipulate term column and you are done.

suppressMessages(library(tidyverse))
library(rsample)
compute <- function(split, ...) {
  analysis(split) %>%   
    group_by(am) %>% 
    summarise(estimate = mean(mpg)) %>% 
    mutate(
      term = paste0("am_", am)
    )
}

set.seed(2)
bt <-
  bootstraps(mtcars, times = 200, apparent = TRUE) %>%
  mutate(ratio = map(splits, ~ compute(.x)))

int_pctl(bt, ratio)
#> Warning: Recommend at least 1000 non-missing bootstrap resamples for terms:
#> `am_0`, `am_1`.
#> # A tibble: 2 x 6
#>   term  .lower .estimate .upper .alpha .method   
#>   <chr>  <dbl>     <dbl>  <dbl>  <dbl> <chr>     
#> 1 am_0    15.6      17.1   18.8   0.05 percentile
#> 2 am_1    20.8      24.3   27.4   0.05 percentile

^{Created on 2020-05-21 by the reprex package (v0.3.0)}

Well, at least I got to spend some quality time with the new pivot command...

system · May 28, 2020, 2:42pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.