I have the following dataset:
weeks <- rep(seq(as.Date("2010-01-01"), as.Date("2023-01-01"), by = "week"), each = 1)
counts <- rpois(length(weeks), lambda = 50)
df <- data.frame(Week = as.character(weeks), Count = counts)
I am trying to fit a time series model to this data and perform LOOCV (Leave One Out Cross Validation). That is:
- I want to fit a model to 70% of the data (in chronological order)
- Predict the next point (horizon = 1)
- Record the error of this prediction
- Predict the next point, record the error
- Repeat until the remaining 30% has been completed - record the final average error (RMSE, MAE)
Using this post as a reference (Time series forecast cross-validation), I tried to write a loop to perform this procedure.
First, I set up the requirements for the loop:
# Split data into training and test sets
train_size <- floor(0.7 * nrow(df))
train <- df[1:train_size, ]
test <- df[(train_size + 1):nrow(df), ]
# Fit ARIMA model to training data
model <- auto.arima(train$Count)
# Initialize vector to store prediction errors
errors <- vector("numeric", length = nrow(test))
rmse_vec <- numeric()
mae_vec <- numeric()
Next, I tried to run the loop:
# Loop over test set
for (i in seq(train_end+1, n)) {
# Split data into training and validation sets
train <- df[1:(i-1), ]
val <- df[i:(i+1), ]
# Fit ARIMA model to training data
model <- auto.arima(train$Count)
# Forecast one step ahead using the model and record error
fc <- forecast(model, h = 1)
error <- val$Count - fc$mean
# Record RMSE and MAE
rmse_vec[i - train_end - 1] <- sqrt(mean(error^2))
mae_vec[i - train_end - 1] <- mean(abs(error))
# Update training data with actual count
train$Count <- c(train$Count, val$Count[1])
}
# Compute mean RMSE and MAE
mean_rmse <- mean(rmse_vec)
mean_mae <- mean(mae_vec)
Problem: However, this code is giving me the following error:
Error in `-.default`(val$Count, fc$mean) :
time-series/vector length mismatch
Can someone please show me how to fix this? Am I doing LOOCV for time series models correctly?
Thanks!
- Note : I am not sure if this is how LOOCV is typically performed on time series models - perhaps someone here knows?