ARIMA on external regressors

Hello,

Is there a way to perform ARIMA on all external regressors at once? I have several variables and I am running into issues of dealing with each variable at a time. My sample data below includes only 2 external regressors, but my original data includes 7 to 8 and it gets difficult to handle ARIMA on all of them. Any help would be appreciated!

library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)

df <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
                 date = c("2019-01-01",
                          "2019-04-01","2019-07-01","2019-10-01","2020-01-01",
                          "2020-04-01","2020-07-01","2020-10-01","2021-01-01",
                          "2021-04-01","2021-07-01","2021-10-01"),
            `sales` = c(21999,28022,30464,
                          26861,24990,17015,30381,29716,NA,NA,NA,NA),
            `gdp` = c(2211.94,2259.38,2243.29,
                          2246.55,2158.49,2086.65,2305.75,NA,NA,NA,NA,
                          NA),
            `oil` = c(191125,191125,189738,
                          238556,263929,274390,282798,292390,302517,NA,NA,
                          NA)
   )

df  <-  df1%>%
      mutate(date = yearquarter(date)) %>% 
       as_tsibble(index = date)

For now, I am performing ARIMA on each variable individually. Though, I am getting results as I would like, but its tedious with different horizon for each as seen in the data.

If I try to make longer data, I am not able to perform ARIMA even to get values on just Q2 while keeping horizon as 1. Of course, horizon would change for each of these external variables. I am not sure how to have efficient way to code here. Any help would be appreciated. Thanks for your help!

df1 <- df %>%
    pivot_longer(-date, "features", "value")%>%
   filter_index(~"2020 Q1")

df2%>%
    model(
    arima = ARIMA(value)
         )%>%
    forecast(h = 1)

It does not appear that you are using external regressors, which would be something like
model(arima = ARIMA(sales ~ gdp)) with gdp used to explain the value of sales, along with arima regression errors. Edit: OK, you are forecasting all of the external regressors so you can then forecast the actual dependent variable. Good thing I am close to retirement!

To estimate arima models for many different variables, this is one way to do it. I assumed that df2 is just a typo for df1, as df2 is not defined.

library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)

df <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  date = c("2019-01-01",
           "2019-04-01","2019-07-01","2019-10-01","2020-01-01",
           "2020-04-01","2020-07-01","2020-10-01","2021-01-01",
           "2021-04-01","2021-07-01","2021-10-01"),
  `sales` = c(21999,28022,30464,
              26861,24990,17015,30381,29716,NA,NA,NA,NA),
  `gdp` = c(2211.94,2259.38,2243.29,
            2246.55,2158.49,2086.65,2305.75,NA,NA,NA,NA,
            NA),
  `oil` = c(191125,191125,189738,
            238556,263929,274390,282798,292390,302517,NA,NA,
            NA)
)

df <- df %>%
  mutate(date = yearquarter(date)) %>% 
  as_tsibble(index = date)

df1 <- df %>%
  pivot_longer(-date, "features", "value") %>%
  filter_index(~ "2020 Q1")

df1 %>%
  group_by(features) %>%
  model(arima = ARIMA(value)) %>%
  forecast(h = 1)
#> # A fable: 3 x 5 [1Q]
#> # Key:     features, .model [3]
#>   features .model    date              value   .mean
#>   <chr>    <chr>    <qtr>             <dist>   <dbl>
#> 1 gdp      arima  2020 Q2      N(2224, 1642)   2224.
#> 2 oil      arima  2020 Q2 N(214895, 1.2e+09) 214895.
#> 3 sales    arima  2020 Q2    N(26467, 1e+07)  26467.

Created on 2021-03-30 by the reprex package (v1.0.0)

You can also forecast each series one period ahead from the last quarter for which it has data.

library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)

df <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  date = c("2019-01-01",
           "2019-04-01","2019-07-01","2019-10-01","2020-01-01",
           "2020-04-01","2020-07-01","2020-10-01","2021-01-01",
           "2021-04-01","2021-07-01","2021-10-01"),
  `sales` = c(21999,28022,30464,
              26861,24990,17015,30381,29716,NA,NA,NA,NA),
  `gdp` = c(2211.94,2259.38,2243.29,
            2246.55,2158.49,2086.65,2305.75,NA,NA,NA,NA,
            NA),
  `oil` = c(191125,191125,189738,
            238556,263929,274390,282798,292390,302517,NA,NA,
            NA)
)

df <- df %>%
  mutate(date = yearquarter(date)) %>% 
  as_tsibble(index = date)

# df1 <- df %>%
#   pivot_longer(-date, "features", "value") %>%
#   filter_index(~ "2020 Q1")
# 
# df1 %>%
#   group_by(features) %>%
#   model(arima = ARIMA(value)) %>%
#   forecast(h = 1)

df2 <- df %>%
  pivot_longer(-date, "features", "value")

df2 %>%
  group_by(features) %>%
  filter(!is.na(value)) %>%
  model(arima = ARIMA(value)) %>%
  forecast(h = 1)
#> # A fable: 3 x 5 [1Q]
#> # Key:     features, .model [3]
#>   features .model    date             value   .mean
#>   <chr>    <chr>    <qtr>            <dist>   <dbl>
#> 1 gdp      arima  2020 Q4     N(2216, 5278)   2216.
#> 2 oil      arima  2021 Q2 N(3e+05, 5.3e+08) 297796.
#> 3 sales    arima  2021 Q1 N(26181, 2.2e+07)  26181.

Created on 2021-03-30 by the reprex package (v1.0.0)

Thank you so much @EconProf ! So group_by was the issue when I was trying. This is a huge help already to forecast immediate next values.

But because my variables have different missing periods. Is there a way to forecast all the missing periods instead of forecasting just one period ahead from previous data point? This is the main problem I am facing as currently I am forecasting each series individually and is very time consuming with more number of variables. After forecasting each series until the end of data frame (2021 Q4 in this example), then I can perform forecasting on sales.

Thanks again for your help!

Will this work for you?

library(tidyverse)
library(lubridate)
library(tsibble)
library(fable)

df <- data.frame(
  stringsAsFactors = FALSE,
  check.names = FALSE,
  date = c("2019-01-01",
           "2019-04-01","2019-07-01","2019-10-01","2020-01-01",
           "2020-04-01","2020-07-01","2020-10-01","2021-01-01",
           "2021-04-01","2021-07-01","2021-10-01"),
  `sales` = c(21999,28022,30464,
              26861,24990,17015,30381,29716,NA,NA,NA,NA),
  `gdp` = c(2211.94,2259.38,2243.29,
            2246.55,2158.49,2086.65,2305.75,NA,NA,NA,NA,
            NA),
  `oil` = c(191125,191125,189738,
            238556,263929,274390,282798,292390,302517,NA,NA,
            NA)
)

df <- df %>%
  mutate(date = yearquarter(date)) %>% 
  as_tsibble(index = date)

df2 <- df %>%
  pivot_longer(-date, "features", "value")

df2 %>%
  group_by(features) %>%
  filter(!is.na(value)) %>%
  model(arima = ARIMA(value)) %>%
  forecast(h = 5) %>%  # enter maximum # of NA periods (gdp is missing 5 observations)
  as_tsibble(index = date) %>%
  filter_index(. ~ "2021 Q4")
#> # A tsibble: 12 x 5 [1Q]
#> # Key:       features, .model [3]
#>    features .model    date              value   .mean
#>    <chr>    <chr>    <qtr>             <dist>   <dbl>
#>  1 gdp      arima  2020 Q4      N(2216, 5278)   2216.
#>  2 gdp      arima  2021 Q1      N(2216, 5278)   2216.
#>  3 gdp      arima  2021 Q2      N(2216, 5278)   2216.
#>  4 gdp      arima  2021 Q3      N(2216, 5278)   2216.
#>  5 gdp      arima  2021 Q4      N(2216, 5278)   2216.
#>  6 oil      arima  2021 Q2  N(3e+05, 5.3e+08) 297796.
#>  7 oil      arima  2021 Q3 N(293477, 9.8e+08) 293477.
#>  8 oil      arima  2021 Q4 N(289525, 1.3e+09) 289525.
#>  9 sales    arima  2021 Q1  N(26181, 2.2e+07)  26181.
#> 10 sales    arima  2021 Q2  N(26181, 2.2e+07)  26181.
#> 11 sales    arima  2021 Q3  N(26181, 2.2e+07)  26181.
#> 12 sales    arima  2021 Q4  N(26181, 2.2e+07)  26181.

Created on 2021-03-31 by the reprex package (v1.0.0)

Thank you @EconProf ! This is great! Why didn't I think about that?
Thank you!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.