I have a general question about Tidymodels that I have not found an explicit answer to in the online texts.
- Recipes can be used to engineer data.
- Models can be used to fit the data.
- Workflows can be used to link Recipes and Models and execute the fit.
- Resampling can be used to rerun and cross-validate models.
My concern is that, when resampling is used to validate models, the data should be engineered specifically for the training data within each iteration. To engineer the data only once, prior to resampling, would provide the iteration with information about data structures it might not have in a truly predictive scenario, and the model may be overfit as a result.
My question is this: when you conduct a k-fold or other resampling exercise in Tidymodels, does it re-engineer the data within each iteration? Or is the training dataset first engineered in entirety, and then divided into training and test groups for each iteration?