Cross Validating Time Series Models in R

I have the following dataset:

weeks <- rep(seq(as.Date("2010-01-01"), as.Date("2023-01-01"), by = "week"), each = 1)
counts <- rpois(length(weeks), lambda = 50)
df <- data.frame(Week = as.character(weeks), Count = counts)

I am trying to fit a time series model to this data and perform LOOCV (Leave One Out Cross Validation). That is:

  • I want to fit a model to 70% of the data (in chronological order)
  • Predict the next point (horizon = 1)
  • Record the error of this prediction
  • Predict the next point, record the error
  • Repeat until the remaining 30% has been completed - record the final average error (RMSE, MAE)

Using this post as a reference (Time series forecast cross-validation), I tried to write a loop to perform this procedure.

First, I set up the requirements for the loop:

# Split data into training and test sets
train_size <- floor(0.7 * nrow(df))
train <- df[1:train_size, ]
test <- df[(train_size + 1):nrow(df), ]

# Fit ARIMA model to training data
model <- auto.arima(train$Count)

# Initialize vector to store prediction errors
errors <- vector("numeric", length = nrow(test))

rmse_vec <- numeric()
mae_vec <- numeric()

Next, I tried to run the loop:

# Loop over test set
for (i in seq(train_end+1, n)) {
    
    # Split data into training and validation sets
    train <- df[1:(i-1), ]
    val <- df[i:(i+1), ]
    
    # Fit ARIMA model to training data
    model <- auto.arima(train$Count)
    
    # Forecast one step ahead using the model and record error
    fc <- forecast(model, h = 1)
    error <- val$Count - fc$mean
    
    # Record RMSE and MAE
    rmse_vec[i - train_end - 1] <- sqrt(mean(error^2))
    mae_vec[i - train_end - 1] <- mean(abs(error))
    
    # Update training data with actual count
    train$Count <- c(train$Count, val$Count[1])
}

# Compute mean RMSE and MAE
mean_rmse <- mean(rmse_vec)
mean_mae <- mean(mae_vec)

Problem: However, this code is giving me the following error:

 Error in `-.default`(val$Count, fc$mean) : 
  time-series/vector length mismatch

Can someone please show me how to fix this? Am I doing LOOCV for time series models correctly?

Thanks!

  • Note : I am not sure if this is how LOOCV is typically performed on time series models - perhaps someone here knows?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.