I'm experiencing an error message when working with a list column after using rsample. I created a sample of sanitized data as a csv and it seems to reproduce the issue. This csv file here.
(Incidentally is there a better way to share example data on here?)
Here is the script and error message:
# (so you know what's loaded in my environment) library(tufte) library(tidyverse) library(lubridate) library(foreach) library(doParallel) library(scales) library(kableExtra) library(rmarkdown) library(dbplyr) library(DBI) library(odbc) library(rlang) library(rsample) library(Metrics) example_data <- read_csv("example_data.csv") example_split <- initial_split(example_data, 0.9) training_data <- training(example_split) testing_data <- testing(example_split) # 5 fold split stratified on spender train_cv <- vfold_cv(training_data, 5, strata = j) %>% # create training and validation sets within each fold mutate(train = map(splits, ~training(.x)), validate = map(splits, ~testing(.x))) # everything works up till this point. It's when I try to do anything with train_cv that I encounter issues. blah <- train_cv %>% crossing(mtry = c(1,2)) > Error: `x` must be a vector, not a `rsplit/vfold_split` object blah2 <- train_cv %>% crossing(nrounds = c(100, 150, 200)) > Error: `x` must be a vector, not a `rsplit/vfold_split` object
train_cv looks before trying to use it with crossing():
train_cv # 5-fold cross-validation using stratification # A tibble: 5 x 4 splits id train validate * <named list> <chr> <named list> <named list> 1 <split [72K/18K]> Fold1 <tibble [72,000 × 10]> <tibble [18,001 × 10]> 2 <split [72K/18K]> Fold2 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 3 <split [72K/18K]> Fold3 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 4 <split [72K/18K]> Fold4 <tibble [72,001 × 10]> <tibble [18,000 × 10]> 5 <split [72K/18K]> Fold5 <tibble [72,001 × 10]> <tibble [18,000 × 10]>
Desired outcome is that there will be a new column 'mtry' and each existing row (fold) will have a row for mtry = 1 and another for mtry = 2.
How can I continue to work with
train_cv after creating folds and then use crossing to experiment with various hyper parameters in my workflow?
This error is inconsistent. I tried restarting my session and it ran fine. But in my script I cannot restart my session each time I come to this code block. What could be loaded in my space that would lead to this error? Screen shot:
And then after clearing workspace and starting a fresh session, everything works:
In case it's informative, here's what I'm shown when I click 'show traceback' on the error message:
Error: `x` must be a vector, not a `rsplit/vfold_split` object 23. stop(fallback) 22. signal_abort(cnd) 21. abort(message, .subclass = c(.subclass, "vctrs_error"), ...) 20. stop_vctrs(msg, "vctrs_error_scalar_type", actual = x) 19. stop_scalar_type(.Primitive("quote")(structure(list(data = structure(list( s = c("92DF3481-4F83-47E4-AE08-E7AD35EBC2B9", "IDFV-DB587A66-50ED-4468-999D-81CB9D872B81", "EAB6422C-17D7-428C-B25A-9BA9EB5C6FE2", "IDFV-A228265A-CB20-40EE-BEFF-85A532525DA2", "IDFV-109FD148-8287-47BF-A301-130557CC2583", "6B611B4F-6C1B-45D2-99E7-3BA9BF67CC27", ... 18. vec_unique_loc(x) 17. vec_slice(x, vec_unique_loc(x)) 16. vec_unique(x) 15. vec_proxy_compare(x) 14. is.data.frame(proxy) 13. order_proxy(vec_proxy_compare(x), direction = direction, na_value = na_value) 12. vec_order(x, direction = direction, na_value = na_value) 11. vec_sort(vec_unique(x)) 10. .f(.x[[i]], ...) 9. map(cols, sorted_unique) 8. crossing(., mtry = c(1, 2)) 7. function_list[[i]](value) 6. freduce(value, `_function_list`) 5. `_fseq`(`_lhs`) 4. eval(quote(`_fseq`(`_lhs`)), env, env) 3. eval(quote(`_fseq`(`_lhs`)), env, env) 2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) 1. train_cv %>% crossing(mtry = c(1, 2)) %>% mutate(model_binary = map2(.x = train, .y = mtry, ~ranger::ranger(formula = spender ~ d7_utility_sum + recent_utility_ratio, probability = T, mtry = .y, data = .x %>% filter(spend_7d == 0) %>% mutate(spender = factor(spender)))), ...