Best option to predict time series with multiple variables

Hi.

I've been working lately with prediction examples for time series with a single variable. The problem I have now is that I have to predict the same variable, with the help of others variables.

For example, I have to predict variable A but with the help of variable B, C and D.With this problem I'm a little lost and I need help with what is the best option to do it in R.

I also have doubts if what I have to do is to treat it as 4 series and predict A, or if I want to predict the next weeks of variable A (10 weeks for example) I also have to enter the values of B, C and D of those weeks( it's data that I can also get).

Any help on which way to follow will be welcome.

Thanks!

Hi Frank,

A simple method is a multiple regression model. There, you can use Variables B, C and D to predict Column A. Do you have to do this for your work or a stats course? Please, give us a summary of your database to help you with this task.

Hi Rafael.

My database is:

  • Variable A: KG produced
  • Variable B: Temperature
  • Variable C: Precipitation
  • Variable D: Humidity

So, I have a data of all the variables since 2015 grouped by week. You say that a form could be a multiple regression model, but this is a time series, so I need to use the date. I don't know if that is the best option.

It's there where I am lost. I can have the data for the next weeks of Temperature, Precipitation and Humidity (predicted by weather websites). The idea is to use that data to help predict my future weeks of production (kg).

So I don't know if it's better the model that is trained with the production history and the other three time series, and that it is predicted without making use of the new weeks of the 3 variables temperature, precipitation and humidity. Or if I have to use them ... or make another type of model such as multiple regression (but I need the date, right? And I'm interested in having it, because maybe the conditions of weather are good, but by date I know that the food that I am predicting the production is not grown at that time of year).

I don't know if I explained well, but that's where I am currently.

Thanks for any help!

Hi Frank,

For this time I don't recommend you to take time series. You can work with B, C and D variables in a multiple regression model. Try to do a model and post the result here.

I give you and idea, you can transform date into a new variable. For example, in a 52 weeks year, week 1 to week 8: Good production season, week 9 to week 20: Bad production season, week 21 to week 52: normal production season. Now, you will have a new variable:

For example, variable "E" , the factors of this variable could be: Good production season, bad production season and normal production season. Replace their names with just: "BAD", "NORMAL" and "GOOD" and done. You have a new column variable that will make your model more efficient.

I remain attentive to your comments.

I can't do this. I have a lot of products (so I have to do a lot of forecasts) and I cant do manually where are the weeks where each product is produced...

Ok, If you can't do it like this way, at least, add a "Trimester" variable, this will help your model to predict production. This is my hypothesis. When you do the model, we can investigate if trimester variable is working or not for you.

  1. Add to your database a "trimester" variable. If your database is in excel, it is easy to add a new column.
  2. Try to do multiple regression in R.

regression <- lm(A ~ B + C + D + E(trimeter variable), data = yourdatabase)
summary(regression)

  1. Post the result
  2. We evaluate if model is correct or not

You can use vector autoregression (VAR) for multiple time series. You may use the vars package for this.

Why would you do autoregressive model if you have variables?

Because that is usually the way you make simple time-series models.

@franky1010 if you want to model a time-series but you also want to include exogenous variables, then you have to look into more complex machine learning models like glm, gbm, xgboost, DNN, etc. but the downside of this approach is that you are going to sacrifice ease of interpretability (still possible at some extend with techniques like LIME) in exchange of prediction accuracy.

The easiest way I have found to do this is working with the h2o package and the automl() function, here is a nice blog post whit a guided example.

2 Likes

Because you are dealing with multiple time series.

You should use a multivariate time series forecasting :slight_smile:
Check below Chapter 7 Multivariate TS Analysis:

Hello,

I agree with Rafael that a multiple regression model is likely to work very well here. You can use a package like timetk (https://github.com/business-science/timetk) to automatically add a number of features that are derived from dates: e.g. week, month, day of the week, quarter, and so on. Then run regression or a more sophisticated model on these features.

Boosted trees, for instance, perform just as well (and often better) on these kind of problems as classic forecasting methods do. See here for an example (it's Python, but the principles are very similar):
https://www.kaggle.com/robikscube/tutorial-time-series-forecasting-with-xgboost

Hope that helps.

The 'broom' package may be valuable to you too since you are trying to model your data

Thank you all for the answers!

I am still testing the different options seeing which is the best solution for my problem...

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.