Hoist not working for named-list of data frames

Does anyone know why hoist doesn't seem to be working in the following scenario? It works to pluck out a column from a list-column of tibbles, but doesn't work for a named-list-column of data frames...

library(tidyr)
library(dplyr)
library(purrr)
library(rsample)
  
# Works
iris %>% 
  group_nest(Species) %>% 
  hoist(data, Sepal.Length = 'Sepal.Length')
#> # A tibble: 3 x 3
#>   Species    Sepal.Length data             
#>   <fct>      <list>       <list>           
#> 1 setosa     <dbl [50]>   <tibble [50 × 3]>
#> 2 versicolor <dbl [50]>   <tibble [50 × 3]>
#> 3 virginica  <dbl [50]>   <tibble [50 × 3]>

# Doesn't work
iris %>% 
  vfold_cv() %>% 
  mutate(train = map(splits, training)) %>% 
  hoist(train, Sepal.Length = 'Sepal.Length')
#> Error in if (details$repeats > 1) res <- paste(res, "repeated", details$repeats, : argument is of length zero

If you cross post to GitHub (or anywhere, really), can you please include a link here as well? That way, if it gets resolved there, others can follow along.

1 Like

It looks like it's not that the train is named-list-column, but that the larger tibble containining it has a *-list column:

library(tidyverse)
library(rsample)
  
iris %>% 
  vfold_cv() %>% 
  mutate(train = map(splits, training)) 
#> #  10-fold cross-validation 
#> # A tibble: 10 x 3
#>    splits           id     train             
#>  * <named list>     <chr>  <named list>      
#>  1 <split [135/15]> Fold01 <df[,5] [135 × 5]>
#>  2 <split [135/15]> Fold02 <df[,5] [135 × 5]>
#>  3 <split [135/15]> Fold03 <df[,5] [135 × 5]>
#>  4 <split [135/15]> Fold04 <df[,5] [135 × 5]>
#>  5 <split [135/15]> Fold05 <df[,5] [135 × 5]>
#>  6 <split [135/15]> Fold06 <df[,5] [135 × 5]>
#>  7 <split [135/15]> Fold07 <df[,5] [135 × 5]>
#>  8 <split [135/15]> Fold08 <df[,5] [135 × 5]>
#>  9 <split [135/15]> Fold09 <df[,5] [135 × 5]>
#> 10 <split [135/15]> Fold10 <df[,5] [135 × 5]>

iris %>% 
  vfold_cv() %>% 
  mutate(train = map(splits, training)) %>% 
  select(-1)
#> # A tibble: 10 x 2
#>    id     train             
#>    <chr>  <named list>      
#>  1 Fold01 <df[,5] [135 × 5]>
#>  2 Fold02 <df[,5] [135 × 5]>
#>  3 Fold03 <df[,5] [135 × 5]>
#>  4 Fold04 <df[,5] [135 × 5]>
#>  5 Fold05 <df[,5] [135 × 5]>
#>  6 Fold06 <df[,5] [135 × 5]>
#>  7 Fold07 <df[,5] [135 × 5]>
#>  8 Fold08 <df[,5] [135 × 5]>
#>  9 Fold09 <df[,5] [135 × 5]>
#> 10 Fold10 <df[,5] [135 × 5]>

iris %>% 
  vfold_cv() %>% 
  mutate(train = map(splits, training)) %>% 
  select(-1) %>% 
  hoist(train, Sepal.Length = 'Sepal.Length')
#> # A tibble: 10 x 3
#>    id     Sepal.Length train             
#>    <chr>  <named list> <named list>      
#>  1 Fold01 <dbl [135]>  <df[,4] [135 × 4]>
#>  2 Fold02 <dbl [135]>  <df[,4] [135 × 4]>
#>  3 Fold03 <dbl [135]>  <df[,4] [135 × 4]>
#>  4 Fold04 <dbl [135]>  <df[,4] [135 × 4]>
#>  5 Fold05 <dbl [135]>  <df[,4] [135 × 4]>
#>  6 Fold06 <dbl [135]>  <df[,4] [135 × 4]>
#>  7 Fold07 <dbl [135]>  <df[,4] [135 × 4]>
#>  8 Fold08 <dbl [135]>  <df[,4] [135 × 4]>
#>  9 Fold09 <dbl [135]>  <df[,4] [135 × 4]>
#> 10 Fold10 <dbl [135]>  <df[,4] [135 × 4]>

Created on 2020-03-11 by the reprex package (v0.3.0)

Hmm, thanks for finding that quirk. Not sure what to make of your observation. Is there reason to expect this behaviour? What does the asterisk denote anyway?

Upon further inquiry, the asterisk indicates the tibble has row names, and removing the row names solves the issues too:

iris %>% 
  vfold_cv() %>% 
  mutate(train = map(splits, training)) %>% 
  rownames_to_column() %>% 
  hoist(train, Sepal.Length = 'Sepal.Length')
1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.