I have a basic time series dataset named "lynx", which is included in R. This dataset shows the number of catches of lynxes per year, over a period of 114 years.
Although this is a time series, my teacher asks me to use this dataset to create a regression model capable of predicting the number of catches that have taken place in a given year taking into account only the catches of the previous two or three years.
Unfortunately, I have no idea how to proceed. I do not even know how to process the data to apply, for example, a basic linear regression model to them.
Someone can tell me how to proceed?
But if you want to do a "proper" regression model by accounting for autocorrelation, use the arima() function from the stats package. Rob J. Hyndman, author of the popular time-series modeling package forecast, coauthored an online guide for time series analysis and forecasting. There's a section on autoregressive models.
Because this is for an assignment, remember that "proper" here means "Best practice for the real world." Your teacher probably considers it to mean "only what I asked for."
For your particular problem, I'd suggest two steps:
Transform your data set to get the variables you want. In this case, you may want variables that show a few of the prior years' values.
fake_lynx_data <-
data.frame(year = 1900:2017,
lynx = runif(118, 500, 1000)) # 118 years of random fake counts
# New data frame with more variables we can refer to later
fake_lynx_data_addl_vars <-
fake_lynx_data %>%
mutate(lynx_1yr_prior = lag(lynx, 1),
lynx_2yr_prior = lag(lynx, 2),
lynx_3yr_prior = lag(lynx, 3))
Fit a model to those variables. Since I used totally random data, this model doesn't predict well:
# This uses the 'lm' function to make a linear model to predict 'lynx' based on
# independent relationships to the 3 prior years' counts.
lm(lynx ~ lynx_1yr_prior + lynx_2yr_prior + lynx_3yr_prior,
data = fake_lynx_data_addl_vars) %>%
summary()
# Given that the counts are partly dependent on the counts in prior years, you
# could also try a model that also uses the interactions between those prior counts.
lm(lynx ~ lynx_1yr_prior * lynx_2yr_prior * lynx_3yr_prior,
data = fake_lynx_data_addl_vars) %>%
summary()