First of all, thanks to everyone involved for all the work that has gone into tidymodels.
There's one concept that has been boggling me for a while - maybe someone can point me in the right direction. A link to a good explanation would suffice.
From my own experiments and from literature (Applied Predictive Modeling) I have in mind: "Don't rely on a single test set". I found substantial variation in test set error just by changing the seed to set aside testing data. So I thought the answer to that shortcoming was cross validation (or similar resampling techniques): assessing performance across multiple sets of data unseen during model building. I have used caret to do that. So my answer to expected future performance would be cross validated performance measures.
In blog posts introducing tidymodels, I have frequently seen this idea of first setting aside a single test set, then performing model building using a number of steps for feature engineering, parameter tuning etc., all based on cross validation / resampling, and then estimating future performance on the one single test set. Somehow, this approach does not fully convince me ... Why ending up with a single test set again?
I apologize for cross-posting - I got no answer here in a week: https://rviews.rstudio.com/2020/04/21/the-case-for-tidymodels/