How to adjust Time Series Forecasting based on temporary impact of COVID-19

ksingh19 · October 2, 2020, 6:13pm

Hi,

I have two data sets df1 and df2 as shown below. While forecasting the sales in these two datasets, I have tried to adjust the actual sales for past few months during the period of COVID-19, as otherwise the forecasting is not optimistic. I am getting reasonable results per our expectations for years 2022-2025. However, the results of 2021 don't seem to be realistic. What is the best way to handle values during pandemic assuming things will be normal soon.

df1 results are fine because sales has picked up back in July with only 3 months of COVID affect. But for df2, the sales is still not near where it was before COVID and thus, adjusting actuals is little confusing for this data with longer COVID affect. Reprex data below shows values from 2019 - 2021 for df2. But real data has values from 2015 - 2025

# Data df2 where sales is still going through COVID affect
df2 <- data.frame(
          month = c("2019 Jan","2019 Feb","2019 Mar",
                    "2019 Apr","2019 May","2019 Jun","2019 Jul","2019 Aug",
                    "2019 Sep","2019 Oct","2019 Nov","2019 Dec","2020 Jan",
                    "2020 Feb","2020 Mar","2020 Apr","2020 May","2020 Jun",
                    "2020 Jul","2020 Aug","2020 Sep","2020 Oct","2020 Nov",
                    "2020 Dec","2021 Jan","2021 Feb","2021 Mar","2021 Apr",
                    "2021 May","2021 Jun","2021 Jul","2021 Aug","2021 Sep",
                    "2021 Oct","2021 Nov","2021 Dec"),
          sales = c(1277256,980282,853458,1007229.58,
                    1246084,1193005,1503203.64,1451290,1343771,1582470,
                    1233360.75588,1271090.30412,1337158,1024617,969186,580039,
                    745976,588006,594815,549379.8195,NA,NA,NA,NA,NA,NA,
                    NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
              A = c(575506.6,563908.6,574201,587270.1,
                    599428.1,593574,609182.3,607961.9,570176.4,613627.6,
                    627545.9,647015.4,609026,595426.8,598968.2,545378.5,
                    557292,553829.3,608986.5,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                    NA,NA,NA,NA,NA,NA,NA,NA),
              B = c(6.5,6.5,6.5,6.5,6.5,6.5,6.5,6,
                    5.5,5,5,4.5,4.5,4.25,3.75,3.75,3,2.25,2.25,2.125,
                    2,2.075,2.075,2.075,2.45,2.45,2.45,2.45,2.45,2.45,
                    2.45,2.45,2.45,2.475,2.7,2.885),
              C = c(0.941225,0.895468,0.877436,
                    0.861329,0.849061,0.828872,0.7482,0.760066,0.737631,0.714138,
                    0.674789,0.660361,0.606975,0.65002,0.618654,0.561411,
                    0.452552,0.390509,0.351658,0.342956,0.311385,0.297796,
                    0.270697,0.249496,0.221222,0.207001,0.204105,0.196503,
                    0.181137,0.16973,0.156495,0.155258,0.142849,0.138568,
                    0.127877,0.119766),
              D = c(191209,190273,199974,221723,
                    234599,213995,232861,231522,224014,241786,231590,252037,
                    184681,193299,156304,51463,56705,122795,163424,173822,
                    NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,
                    NA)
   )

df2 <- df2%>%
  mutate(month = yearmonth(month))

Here is my approach, lots of hard coding, any suggestion on improving it would be great also! Showing example for df2 here.

Actuals

# Actuals Until Feb
act <- df2%>%
  slice(1:14)%>%
  mutate(month = as.character(month))

#### Adjusting Sales for months affected by COVID - March - July with Forecast

act <- act%>%
  mutate(month = yearmonth(month))%>%
  as_tsibble(index = month)

covid_sales <- act%>%
  gather(key = "variables", value = "value", -month)%>%
  model(
    arima = ARIMA(value)
  )%>%
    forecast(h = 5)

covid_sales <- covid_sales%>%
  rename("val" = .mean)%>%
select(month, val)%>%
   rename("value" = val)%>%
  spread(variables, value)%>%
  select(month, sales, A, B, C, D)

covid_sales <- covid_sales%>%
  select(month, sales)

act_sales <- act%>%
  select(month, sales)

# Actuals from August onwards
act_aug_on <- df2%>%
  slice(20:36)%>%
  select(month, sales)

# Final Actual Sales
act_sales <- bind_rows(act_sales, covid_sales, act_aug_on)

Actuals - Independent Variables

# Actuals Until Feb
act_var <- act%>%
  slice(1:14)%>%
  mutate(month = yearmonth(month))%>%
  as_tsibble(index = month)

act_var <- act_var%>%
  select(everything(), -sales)

# Adjusting Actuals for COVID - March - July with forecasts
covid_var <- act_var%>%
  gather(key = "variables", value = "value", -month)%>%
  model(
    arima = ARIMA(value)
  )%>%
    forecast(h = 5)

covid_var <- covid_var%>%
  rename("val" = .mean)%>%
select(month, variables, val)%>%
  rename("value" = val)%>%
  spread(variables, value)%>%
  select(month, A, B, C, D)

# Actuals from August onwards
act_aug_on_var <- df2%>%
  slice(20:36)%>%
  select(everything(), -sales)

# Final Actuals for Independent Variables
act_var <- bind_rows(act_var, covid_var, act_aug_on_var)

Final Actual Data

df <- act_sales %>% inner_join(act_var)

Forecasting Independent Variables for modeling

#Forecasts for A
A <- df%>%
  select(month, A)%>%
  slice(1:19)

A_fc <- A%>%
  model(A_model = ARIMA(A)) %>%
  forecast(h=17)

A_fc <- A_fc%>%
  rename("A1" = .mean)%>%
  select(month, A1)%>%
  rename("A" = A1)

#Forecasts for D
D <- df%>%
  select(month, D)%>%
  slice(1:19)

D_fc <- D%>%
  model(D_model = ARIMA(D)) %>%
  forecast(h=17)

D_fc <- D_fc%>%
  rename("D1" = .mean)%>%
  select(month, D1)%>%
  rename("D" = D1)

Final Data

# Combine Independent Variable Forecasts to original data
df <- df %>%
  rows_update(A_fc, by = "month") %>% 
  rows_update(D_fc, by = "month")

# Final Actuals and Forecasted Independent Variables
act <- df%>%
  slice(1:20)

factors <- df%>%
  slice(21:36)

With this as final data, I built model with just one independent variable C where it performed the best among all the combinations.

Forecast results look great for 2022 - 2025 (not in this reprex data), but values for 2021 doesn't seem to be realistic. What is the best way to handle such data where we are trying to adjust actuals for the COVID period. In order to get more accurate results for 2021,what should be the approach. I am thinking of building a separate model to show values only for 2020 and 2021 with better results and keeping the results of the rest of the years as is. Also, I am thinking there is a more efficient way of achieving same results with less of hard coding and more of functions and loops maybe. Any help with this is much appreciated.

Thanks for your help!

technocrat · October 2, 2020, 8:01pm

Things to try:

Simplest: omit Apr-Jun
Substitute year earlier Apr-Jun
Use seasonally adjusted year-earlier
decompose earlier series for seasonality and trend; carry forward and add white noise residual

ksingh19 · October 4, 2020, 3:21pm

Thanks @technocrat! Its still a little confusing for me. Can you please explain this with data df2 here or any random data would also be helpful.

Thank you!

Ajackson · October 4, 2020, 3:53pm

And whatever is done is a fudge and needs to be prominently documented on the plot itself!

technocrat · October 4, 2020, 11:29pm

The imputetestbench package may be a better solution than my DIY outline

ksingh19 · October 6, 2020, 5:24pm

Hi,

I am still struggling in making the above code more efficient. I would like to avoid as much of hard coding in the above example as possible. Can anyone please help with this?

Thanks!

system · October 27, 2020, 5:24pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.