Can I use tidymodels last_fit() with a custom split rather than an rsamples one?

I'm reading some documentation on tidymodels::last_fit(). The provided example is:

library(recipes)
library(rsample)
library(parsnip)

set.seed(6735)
tr_te_split <- initial_split(mtcars)

spline_rec <- recipe(mpg ~ ., data = mtcars) %>%
  step_ns(disp)

lin_mod <- linear_reg() %>%
  set_engine("lm")

spline_res <- last_fit(lin_mod, spline_rec, split = tr_te_split)
spline_res

In this example tr_te_split is used to pass to last_fit().

In my case I have manually split my data based on time, e.g. 11 months training, most recent month for testing.

Is there some way that I use last_fit() and instead of a rsample split object, instead pass a train and test df?

last_fit() does need an rsplit object but you can make such an object from two custom data frames with make_splits(). There is also initial_time_split() which makes a split that's aware of the temporal ordering of the data. The cutoff isn't a date but a proportion (unlike in your exact usecase) but it might be useful to be aware off.

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip

data(Chicago)

# split with temporal ordering
time_split <- initial_time_split(Chicago, prob = 0.75)
time_split
#> <Analysis/Assess/Total>
#> <4273/1425/5698>

chicago_train <- training(time_split)
chicago_test <- testing(time_split)

range(chicago_train$date)
#> [1] "2001-01-22" "2012-10-03"
range(chicago_test$date)
#> [1] "2012-10-04" "2016-08-28"

# use custom data frames to make split object
custom_split <- make_splits(chicago_train, assessment = chicago_test)
custom_split
#> <Analysis/Assess/Total>
#> <4273/1425/5698>

Created on 2022-02-16 by the reprex package (v2.0.1)

2 Likes

Thanks for this Hannah, very helpful to know about! Will use this in my workflow.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.