Hi,
I am working my way through the Forecasting: Principles and Practice book: https://otexts.com/fpp3/
In section 5.8, the book describes how to evaluate point forecast accuracy. It says that you can only calculate this when applying the model to data which wasn't used when fitting the model.
However, the example given creates a training dataset which is a subset of the main dataset. The accuracy of the model is calculated using the main dataset although the main dataset contains data that was used to fit the model. I am confused as to why this is being done and why isn't separate train and test datasets being produced. Would it be best to train the model using a completely separate train dataset and then evaluate the model using a test dataset?
The code given by the book is as below:
library(fpp3)
recent_production <- aus_production %>% filter(year(Quarter) >= 1992)
beer_train <- recent_production %>% filter(year(Quarter) <= 2007)
beer_fit <- beer_train %>%
model(
Mean = MEAN(Beer),
Naïve
= NAIVE(Beer),
Seasonal naïve
= SNAIVE(Beer),
Drift = RW(Beer ~ drift())
)
beer_fc <- beer_fit %>%
forecast(h = 10)
accuracy(beer_fc, recent_production)
Referred here by Forecasting: Principles and Practice, by Rob J Hyndman and George Athanasopoulos