Does anyone know why hoist doesn't seem to be working in the following scenario? It works to pluck out a column from a list-column of tibbles, but doesn't work for a named-list-column of data frames...
library(tidyr)
library(dplyr)
library(purrr)
library(rsample)
# Works
iris %>%
group_nest(Species) %>%
hoist(data, Sepal.Length = 'Sepal.Length')
#> # A tibble: 3 x 3
#> Species Sepal.Length data
#> <fct> <list> <list>
#> 1 setosa <dbl [50]> <tibble [50 × 3]>
#> 2 versicolor <dbl [50]> <tibble [50 × 3]>
#> 3 virginica <dbl [50]> <tibble [50 × 3]>
# Doesn't work
iris %>%
vfold_cv() %>%
mutate(train = map(splits, training)) %>%
hoist(train, Sepal.Length = 'Sepal.Length')
#> Error in if (details$repeats > 1) res <- paste(res, "repeated", details$repeats, : argument is of length zero
mara
March 11, 2020, 5:39pm
2
If you cross post to GitHub (or anywhere, really), can you please include a link here as well? That way, if it gets resolved there, others can follow along.
opened 04:19PM - 11 Mar 20 UTC
closed 05:26PM - 01 Apr 20 UTC
Hi,
I have been staring at this for too long and can't seem to figure out why… `hoist()` won't pull out columns from my list-column of data frames, meanwhile it works perfectly well on a list-column of tibbles. Am I doing something wrong here?
``` r
library(tidyr)
library(dplyr)
library(purrr)
library(rsample)
# Works
iris %>%
group_nest(Species) %>%
hoist(data, Sepal.Length = 'Sepal.Length')
#> # A tibble: 3 x 3
#> Species Sepal.Length data
#> <fct> <list> <list>
#> 1 setosa <dbl [50]> <tibble [50 × 3]>
#> 2 versicolor <dbl [50]> <tibble [50 × 3]>
#> 3 virginica <dbl [50]> <tibble [50 × 3]>
# Doesn't work
iris %>%
vfold_cv() %>%
mutate(train = map(splits, training)) %>%
hoist(train, Sepal.Length = 'Sepal.Length')
#> Error in if (details$repeats > 1) res <- paste(res, "repeated", details$repeats, : argument is of length zero
```
<sup>Created on 2020-03-11 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
1 Like
It looks like it's not that the train
is named-list-column, but that the larger tibble containining it has a *-list column:
library(tidyverse)
library(rsample)
iris %>%
vfold_cv() %>%
mutate(train = map(splits, training))
#> # 10-fold cross-validation
#> # A tibble: 10 x 3
#> splits id train
#> * <named list> <chr> <named list>
#> 1 <split [135/15]> Fold01 <df[,5] [135 × 5]>
#> 2 <split [135/15]> Fold02 <df[,5] [135 × 5]>
#> 3 <split [135/15]> Fold03 <df[,5] [135 × 5]>
#> 4 <split [135/15]> Fold04 <df[,5] [135 × 5]>
#> 5 <split [135/15]> Fold05 <df[,5] [135 × 5]>
#> 6 <split [135/15]> Fold06 <df[,5] [135 × 5]>
#> 7 <split [135/15]> Fold07 <df[,5] [135 × 5]>
#> 8 <split [135/15]> Fold08 <df[,5] [135 × 5]>
#> 9 <split [135/15]> Fold09 <df[,5] [135 × 5]>
#> 10 <split [135/15]> Fold10 <df[,5] [135 × 5]>
iris %>%
vfold_cv() %>%
mutate(train = map(splits, training)) %>%
select(-1)
#> # A tibble: 10 x 2
#> id train
#> <chr> <named list>
#> 1 Fold01 <df[,5] [135 × 5]>
#> 2 Fold02 <df[,5] [135 × 5]>
#> 3 Fold03 <df[,5] [135 × 5]>
#> 4 Fold04 <df[,5] [135 × 5]>
#> 5 Fold05 <df[,5] [135 × 5]>
#> 6 Fold06 <df[,5] [135 × 5]>
#> 7 Fold07 <df[,5] [135 × 5]>
#> 8 Fold08 <df[,5] [135 × 5]>
#> 9 Fold09 <df[,5] [135 × 5]>
#> 10 Fold10 <df[,5] [135 × 5]>
iris %>%
vfold_cv() %>%
mutate(train = map(splits, training)) %>%
select(-1) %>%
hoist(train, Sepal.Length = 'Sepal.Length')
#> # A tibble: 10 x 3
#> id Sepal.Length train
#> <chr> <named list> <named list>
#> 1 Fold01 <dbl [135]> <df[,4] [135 × 4]>
#> 2 Fold02 <dbl [135]> <df[,4] [135 × 4]>
#> 3 Fold03 <dbl [135]> <df[,4] [135 × 4]>
#> 4 Fold04 <dbl [135]> <df[,4] [135 × 4]>
#> 5 Fold05 <dbl [135]> <df[,4] [135 × 4]>
#> 6 Fold06 <dbl [135]> <df[,4] [135 × 4]>
#> 7 Fold07 <dbl [135]> <df[,4] [135 × 4]>
#> 8 Fold08 <dbl [135]> <df[,4] [135 × 4]>
#> 9 Fold09 <dbl [135]> <df[,4] [135 × 4]>
#> 10 Fold10 <dbl [135]> <df[,4] [135 × 4]>
Created on 2020-03-11 by the reprex package (v0.3.0)
Hmm, thanks for finding that quirk. Not sure what to make of your observation. Is there reason to expect this behaviour? What does the asterisk denote anyway?
Upon further inquiry, the asterisk indicates the tibble has row names, and removing the row names solves the issues too:
iris %>%
vfold_cv() %>%
mutate(train = map(splits, training)) %>%
rownames_to_column() %>%
hoist(train, Sepal.Length = 'Sepal.Length')
1 Like
system
Closed
March 19, 2020, 3:09pm
6
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.