Problem to use cross validation with stretch_tsibble

Hi,

I'm trying to cross validate my models with stretch_tsibble to find the lowest RMSE, it is a daily data,
I'm little confuse about .init parameter in stretch_tsibble, which is the best to choose?
Also follow my minimal example where I found erros using Fourier e NANs in accuracy, some help her will be appreciated,
Regards.

library(tsibble)
#> Warning: package 'tsibble' was built under R version 3.6.2
library(lubridate)
#> Warning: package 'lubridate' was built under R version 3.6.2
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:tsibble':
#> 
#>     interval
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.6.2
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(fable)
#> Warning: package 'fable' was built under R version 3.6.2
#> Carregando pacotes exigidos: fabletools

iniciativa <- tibble(
    data_planejada = sample(seq(as.Date("2020-01-01"), length=200, by="1 day"), size=200),
    n = sample(seq(200), size=200)
) %>%  as_tsibble()
#> Using `data_planejada` as index variable.


train <- iniciativa %>%
    filter_index("2020-01-01" ~ "2020-05-29")
test <- iniciativa %>% 
    filter_index("2020-05-30" ~ .)


tsibble_cv <- train %>% 
    slice(1:(n() - 140)) %>% 
    stretch_tsibble(.init = 2, 
                    .step = 1)

fc_cv <- tsibble_cv %>% 
    model(
        arima = ARIMA(n ~ trend() + PDQ(0,0,0) + fourier(K = 3)), 
    ) %>%  
    forecast(h = "20 weeks")
#> Warning: Provided exogenous regressors are rank deficient, removing regressors:
#> `fourier(K = 3)S1_7`, `fourier(K = 3)C2_7`, `fourier(K = 3)S2_7`, `fourier(K =
#> 3)C3_7`, `fourier(K = 3)S3_7`
#> Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
#> You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.
#> Warning: Provided exogenous regressors are rank deficient, removing regressors:
#> `fourier(K = 3)C2_7`, `fourier(K = 3)S2_7`, `fourier(K = 3)C3_7`, `fourier(K =
#> 3)S3_7`
#> Warning: Provided exogenous regressors are rank deficient, removing regressors:
#> `fourier(K = 3)S2_7`, `fourier(K = 3)C3_7`, `fourier(K = 3)S3_7`
#> Warning: Provided exogenous regressors are rank deficient, removing regressors:
#> `fourier(K = 3)C3_7`, `fourier(K = 3)S3_7`
#> Warning: Provided exogenous regressors are rank deficient, removing regressors:
#> `fourier(K = 3)S3_7`
#> Warning: 6 errors (1 unique) encountered for arima
#> [6] Could not find an appropriate ARIMA model.
#> This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
#> For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots



fc_cv %>% 
    accuracy(test)
#> Warning: The future dataset is incomplete, incomplete out-of-sample data will be treated as missing. 
#> 148 observations are missing between 2020-01-03 and 2020-05-29
#> # A tibble: 1 x 10
#>   .model .type    ME  RMSE   MAE   MPE  MAPE  MASE RMSSE  ACF1
#>   <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 arima  Test    NaN   NaN   NaN   NaN   NaN   NaN   NaN    NA

Created on 2020-12-07 by the reprex package (v0.3.0)

I noticed some issues with your code. First, and just a minor thing, why do you randomize the order of the dates by sampling? Tsibble is able to fix that, but why? Second, you create a training set of 150 days and then take a slice of only the first ten days. Third, you stretch that small data set into sets of two, three, .... , and nine days and then try to fit ARIMA models with Fourier daily seasonality curves to these tiny sets. Fourth, I use cross validation to measure the forecast accuracy of the method (with the chosen forecast horizon), not generate a single estimated model to forecast the dates in the test set.

I am busy grading final exams so will not be able to respond again. Good luck!

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.