Tidymodels using sliding_period to classify

Hi,

I have a model which i am trying to classify a binary outcome per day
We use profiling of entities to achieve this. More concretely as each day progresses, the results from the previous day are incorporated into the profile data to help predict the model. I want to ascertain the accuracy of our model and I thought the best way to do this would be to use sliding_period in the rsamples package.

Below is an example of an attempt at cross validation. The profile data are basically features of the train dataset as well as the regular features of the dataset. I have just split them for clarity.

The second slice now updates the profile data based on the actual results of the assessment. More specifically, if the profile of an entity had 20 positive cases in the 90 day profile. The assessment contains another 10 positive cases. These 10 cases would be added to the 20 in the second slice (91 days) to assess the model on slice two. Below is a rough picture :slight_smile: . The assessment set would also have access to the same features

  Profile (90 days)         train              Assessment (+1 Day)
|------------------|   +  |----------|             |---------|
  Profile (91 days)         train              Assessment (+2 Day)
|-------------------|  +  |----------|             |---------|
  Profile (92 days)         train              Assessment (+3 Day)
|--------------------| +  |----------|             |---------|

I have two questions

Does rsamples have something that would allow me to achieve this.

I thought the function sliding_period would have been perfect for this but when i run the code below based on the dates provided below. I get the error

> resamples <- sliding_period(
+   train,
+   my_date,
+   "day",
+   lookback = Inf,
+   assess_stop = 1,
+   skip = 4,
+   step = 2
+ )
Error: `.i` must be in ascending order.
i It is not ascending at locations: 217732, 217992, 218004, 217....
Run `rlang::last_error()` to see where the error occurred.

Below is a list of the dates I am using to partition the data

janitor::tabyl(train$startclock_date)
        train$my_date    n     percent
            2020-03-01  577 0.002646631
            2020-03-02 3039 0.013939536
            2020-03-03 5090 0.023347232
            2020-03-04 3172 0.014549591
            2020-03-05 2999 0.013756060
            2020-03-06 2916 0.013375349
            2020-03-07 1649 0.007563769
            2020-03-08  456 0.002091618
            2020-03-09 2863 0.013132244
            2020-03-10 3162 0.014503722
            2020-03-11 3238 0.014852325
            2020-03-12 3028 0.013889080
            2020-03-13 3206 0.014705545
            2020-03-14 1814 0.008320605
            2020-03-15  535 0.002453982
            2020-03-16 3173 0.014554178
            2020-03-17 3248 0.014898194
            2020-03-18 3129 0.014352355
            2020-03-19 3093 0.014187227
            2020-03-20 3204 0.014696371
            2020-03-21 1643 0.007536248
            2020-03-22  344 0.001577888
            2020-03-23 2904 0.013320307
            2020-03-24 2988 0.013705605
            2020-03-25 2775 0.012728599
            2020-03-26 2634 0.012081848
            2020-03-27 2808 0.012879966
            2020-03-28 1637 0.007508727
            2020-03-29  498 0.002284267
            2020-03-30 2811 0.012893727
            2020-03-31 2819 0.012930422
            2020-04-01 2610 0.011971763
            2020-04-02 2618 0.012008458
            2020-04-03 3287 0.015077083
            2020-04-04  981 0.004499732
            2020-04-05  431 0.001976946
            2020-04-06 1740 0.007981175
            2020-04-07 3350 0.015366056
            2020-04-08 2971 0.013627628
            2020-04-09 2759 0.012655209
            2020-04-10 2512 0.011522249
            2020-04-11 1410 0.006467504
            2020-04-12  420 0.001926491
            2020-04-13 2284 0.010476439
            2020-04-14 2727 0.012508428
            2020-04-15 3041 0.013948709
            2020-04-16 2985 0.013691844
            2020-04-17 3114 0.014283552
            2020-04-18  884 0.004054804
            2020-04-19  396 0.001816405
            2020-04-20 2014 0.009237981
            2020-04-21 2021 0.009270089
            2020-04-22 3235 0.014838565
            2020-04-23 2846 0.013054267
            2020-04-24 2889 0.013251503
            2020-04-25  976 0.004476797
            2020-04-26  743 0.003408054
            2020-04-27 2935 0.013462500
            2020-04-28 3019 0.013847798
            2020-04-29 2966 0.013604693
            2020-04-30 3692 0.016934770
            2020-05-01 1345 0.006169357
            2020-05-02  817 0.003747483
            2020-05-03  417 0.001912730
            2020-05-04 2535 0.011627747
            2020-05-05 2334 0.010705784
            2020-05-06 2973 0.013636801
            2020-05-07 2936 0.013467087
            2020-05-08 3089 0.014168880
            2020-05-09 1572 0.007210579
            2020-05-10  270 0.001238458
            2020-05-11 3213 0.014737653
            2020-05-12 3360 0.015411925
            2020-05-13 3227 0.014801870
            2020-05-14 3241 0.014866086
            2020-05-15 3508 0.016090784
            2020-05-16 1479 0.006783999
            2020-05-17  559 0.002564067
            2020-05-18 3441 0.015783462
            2020-05-19 3657 0.016774229
            2020-05-20 3733 0.017122832
            2020-05-21 3212 0.014733066
            2020-05-22 3322 0.015237623
            2020-05-23 1502 0.006889497
            2020-05-24  303 0.001389825
            2020-05-25 2290 0.010503961
            2020-05-26 3164 0.014512896
            2020-05-27 3093 0.014187227
            2020-05-28 3126 0.014338594
            2020-05-29 3292 0.015100017
            2020-05-30 1109 0.005086853
            2020-05-31  586 0.002687913

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.