# Manually Cross Validating Time Series in R

I have the following data:

``````library(forecast)
library(lubridate)

weeks <- rep(seq(as.Date("2010-01-01"), as.Date("2023-01-01"), by = "week"), each = 1)
counts <- rpois(length(weeks), lambda = 50)
df <- data.frame(Week = as.character(weeks), Count = counts)
``````

I am trying to adapt the R code found here (Rob J Hyndman - Time series cross-validation: an R example) for my problem:

``````# Convert Week column to Date format
df\$Week <- as.Date(df\$Week)

# Create a time series object
ts_data <- ts(df\$Count, frequency = 52, start = c(year(min(df\$Week)), 1))

# Set the length of data for fitting models
k <- 60

# Initialize matrices to store the MAE values
mae1 <- mae2 <- mae3 <- matrix(NA, nrow = length(ts_data) - k, ncol = 12)

# Loop through each time point
for(i in 1:(length(ts_data)-k))
{
# Define the training and testing sets
train_data <- window(ts_data, end = c(year(min(df\$Week)) + floor((i+k-2)/52), (i+k-2)%%52+1))
test_data <- window(ts_data, start = c(year(min(df\$Week)) + floor((i+k-1)/52), (i+k-1)%%52+1),
end = c(year(min(df\$Week)) + floor((i+11+k-1)/52), (i+11+k-1)%%52+1))

# Fit and forecast using three different models
fit1 <- tslm(train_data ~ trend + season, lambda=0)
fcast1 <- forecast(fit1, h=12)
fit2 <- auto.arima(train_data, seasonal = TRUE, lambda = 0)
fcast2 <- forecast(fit2, h = 12)
fit3 <- ets(train_data)
fcast3 <- forecast(fit3, h = 12)

# Calculate MAE for each model's forecast
mae1[i, ] <- abs(fcast1[['mean']] - test_data)
mae2[i, ] <- abs(fcast2[['mean']] - test_data)
mae3[i, ] <- abs(fcast3[['mean']] - test_data)
}

# Plot the MAE values for each model
plot(1:12, colMeans(mae1, na.rm = TRUE), type = "l", col = 2, xlab = "horizon", ylab = "MAE", ylim = c(0, max(colMeans(mae1, na.rm = TRUE), colMeans(mae2, na.rm = TRUE), colMeans(mae3, na.rm = TRUE))))
lines(1:12, colMeans(mae2, na.rm = TRUE), type = "l", col = 3)
lines(1:12, colMeans(mae3, na.rm = TRUE), type = "l", col = 4)
legend("topleft", legend = c("LM", "ARIMA", "ETS"), col = 2:4, lty = 1)
``````

The code seems to partly work, but I still get this error:

`````` number of items to replace is not a multiple of replacement length
``````

Can someone please show me how I can fix this?

Thanks!

In the last few iterations of your for loop, the length of the `test_data` is less than 12. You can fix the problem by adjusting your forecast statements to match the length of `test_data`. e.g.,

``````  fcast1 <- forecast(fit1, h = length(test_data))
``````

Then when you save the results, also adjust the columns used like this:

``````mae1[i,seq_along(test_data)] <- abs(fcast1[['mean']] - test_data)
``````
1 Like

Wow, Dr. Hyndman ... it is such an honor to receive an answer from you directly! I am in such shock and so humbled - thank you so much!

I am continuing to debug my code - I think I got something like this to work:

``````library(forecast)
library(lubridate)

weeks <- rep(seq(as.Date("2010-01-01"), as.Date("2023-01-01"), by = "week"), each = 1)
counts <- rpois(length(weeks), lambda = 50)
df <- data.frame(Week = as.character(weeks), Count = counts)

# Convert Week column to Date format
df\$Week <- as.Date(df\$Week)

# Create a time series object
ts_data <- ts(df\$Count, frequency = 52, start = c(year(min(df\$Week)), 1))

# Set the length of data for fitting models
k <- 60

# Initialize matrices to store the MAE values
mae2 <- matrix(NA, nrow = length(ts_data) - k, ncol = 12)

# Loop through each time point
for(i in 1:(length(ts_data)-k))
{
# Define the training and testing sets
train_data <- window(ts_data, end = c(year(min(df\$Week)) + floor((i+k-2)/52), (i+k-2)%%52+1))
test_data <- window(ts_data, start = c(year(min(df\$Week)) + floor((i+k-1)/52), (i+k-1)%%52+1),
end = c(year(min(df\$Week)) + floor((i+11+k-1)/52), (i+11+k-1)%%52+1))

# Fit and forecast using ARIMA model only
fit2 <- auto.arima(train_data, seasonal = TRUE, lambda = 0)
fcast2 <- forecast(fit2, h = 12)

# Calculate MAE for ARIMA model's forecast
mae2[i, 1:12] <- abs(fcast2[['mean']] - test_data)
}

# Plot the MAE values for ARIMA model
plot(1:12, colMeans(mae2, na.rm = TRUE), type = "l", col = 3, xlab = "horizon", ylab = "MAE", ylim = c(0, max(colMeans(mae2, na.rm = TRUE))))
legend("topleft", legend = "ARIMA", col = 3, lty = 1)
``````

• Is this (rolling cross validation) the most "rigorous" way to test the performance of time series models? I am thinking that this type of cross validation approach will provide more realistic forecast error estimates compared to other cross validation approaches?
• Once we have completely finished training a time series model and want to use it in the real world - how often should we re-fit the model to newly available data?

Thank you so much!

PS: In another question, I am learning how to compare the performance of ARIMA models as they scale to larger datasets: Correctly Using Microbenchmark in R

1. Yes, this is a pretty good way to test the performance of competing models.
2. Once I've chosen a model, I would retrain it on all the available data, and do so every time new observations are available.
3. Rather than choose a specific model, it is often better to combine several models (by averaging the point forecasts, for example).
4. You would probably find it easier to use the `fable` and `tsibble` packages instead of the older `forecast` package. Here's some code that replicates what you are doing using these newer tidy-style packages:
``````library(fpp3)

ts_data <- tibble(
weeks = seq(as.Date("2010-01-01"), as.Date("2023-01-01"), by = "week"),
counts = rpois(length(weeks), lambda = 50)
) |>
mutate(weeks = yearweek(weeks)) |>
as_tsibble(index = weeks)

ts_stretch <- ts_data |>
stretch_tsibble(.init = 60)

fcst <- ts_stretch |>
model(
tslm = TSLM(log(counts) ~ trend() + season()),
arima = ARIMA(log(counts)),
ets = ETS(counts)
) |>
forecast(h = 12)
mae <- fcst |>
group_by(.id, .model) |>
mutate(h = row_number()) |>
ungroup() |>
as_fable(response = "counts", distribution = counts) |>
accuracy(ts_data, measures = list(mae = MAE), by = c(".model", "h"))

mae |>
ggplot(aes(x = h, y = mae, group = .model, col = .model)) +
geom_line() +
labs(x = "horizon", y = "MAE")
``````

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.