I have a "rsplit" object created by
Now I want a create just one validation set based on one column or order. I tried "validation_split()" but it just allows a random sampling. I went to "group_vfold_cv()" which gave the appropiate grouping but, as the name says, it will make a cross-validation and as such will give me 2 resamples.
folds = group_vfold_cv(training(df_split), group = 'column') # Group 2-fold cross-validation # A tibble: 2 x 2 splits id <list> <chr> 1 <rsplit [40912/72608]> Resample1 2 <rsplit [72608/40912]> Resample2
I would like to make something like this:
folds = group_vfold_cv(training(df_split), group = 'column') %>% filter(id == "Resample2")
But this breaks its class and converts it to a tibble that will not be recognized by the tuning function (tune_grid()).
Does anyone knows a way to accomplish this?
Here is a REPREX on what i would like to do:
library(tidymodels) df = tibble( x = runif(100, 0 ,1), y = runif(100, 0,1), group_column = rep(c(1,0), 50)) df_split = initial_split(df, prop = 3/4) #the filter changes the class that is needed for the tune_grid function folds = group_vfold_cv(training(df_split), group = 'group_column') %>% filter(id == "Resample2") boost_spec <- parsnip::boost_tree( trees = tune(), tree_depth = tune()) %>% set_engine("xgboost") %>% set_mode("regression") recipe <- recipe(y ~ ., data = head(training(df_split))) boost_workflow = workflow() %>% add_recipe(recipe) %>% add_model(boost_spec) set.seed(123) boost_grid <- grid_max_entropy( trees(), tree_depth(), size = 2) boost_res = boost_workflow %>% tune_grid(resamples = folds, grid = boost_grid, metrics = metric_set(rmse))
Thanks a lot!