Hi. Yes, I did try using the diamonds dataset. I created folds using rsample::vfold_cv() but each time I tried using crossing() it did work.
I cannot share my own rather large dataset since it's company data.
What makes this problem hard is that if I understood why it sometimes works and why I cannot reproduce on other data, I'd probably know how to solve it.
pdata, the data used for the splits, is just a regular df:
> pdata %>% glimpse()
Observations: 1,000,000
Variables: 11
$ s <chr> "IDFV-FEDC6007-08AC-4810-88A1-F7176467F387", "7081C69E-ECE2-4E39-B7AC-3A58B129E7DE", "8BBD5…
$ IOS <dbl> 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0…
$ is_publisher_organic <dbl> 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1…
$ is_publisher_facebook <dbl> 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0…
$ sessions_d7 <dbl> 1, 1, 12, 1, 1, 1, 2, 1, 1, 2, 8, 12, 3, 1, 3, 14, 2, 4, 1, 1, 2, 1, 1, 2, 14, 11, 1, 4, 2,…
$ sum_session_time_secs_d7 <dbl> 106, 800, 19426, 1431, 1323, 196, 4011, 288, 1152, 4005, 10352, 13402, 4171, 5646, 170, 192…
$ d7_utility_sum <dbl> 1.65927871, 11.00098870, 211.61885361, 19.43254448, 17.89554574, 2.57431089, 49.26038019, 4…
$ recent_utility_ratio <dbl> 1.00, 1.00, 0.86, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 0.55, 0.95, 0.24, 1.00, 0.41, 1…
$ spend_7d <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ spend_30d <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 192, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ spender <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …