How to apply ARIMA/time series on multicolumn/variable dataset


Hi Time Series Experts,

I have 6 columns : x1,x2,x3,x4,x5,y.
x1 is the date column
y is the output
x2,x3,x4 and x5 are different variables which influences y.
So, first 5 columns have impact on y outcome.

I would have used ARIMA is Date, X1 and Y are only in the dataset.
Eg: []
But other variables have influence on the y outcome. So, I dont want to remove them.

Is there a function/way to apply time series considering on 6 columns in the data ?
P.S: Used one-hot encoding to transform the categorical variable

We can simple apply auto.arima on AirPassengers dataset available from R packages but what if there are more than 1 variable influencing the output variable.

Let's consider we have airquality dataset in R:
We want to predict Wind
But it has Solar.R, Ozone, Temp which are influencing this output variable: Wind
How can can I use all these variables along with time: Month and Day
in order to predict Wind.
Although, found someone applied this, I could not grasp how they applied in Kaggle.

Code for forecasting AirPassengers

# Plot time series data
autoplot(AirPassengers) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)", title="Air Passengers from 1949 to 1961") 
# Apply Auto arima
arimaAP <- auto.arima(AirPassengers)
# Forecast next 36 months
forecastAP <- forecast(arimaAP, level = c(95), h = 36)

As we can see the AirPassengers data had only 1 variable dependent on time. Could you please guide how to deal with dataset which has more than 1 variable such as AirQuality data where there are more than 1 variable are influencing the output.

Thanks in advance,



Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:



Thanks Andrescs, I updated the question with more reprex and description sharing researched links as well.



Please provide reproducible code, putting the word "reprex" in your text doesn't make your question a reproducible example.



From auto.arima() documentation you can pass multiple exogenous variables in the form of a matrix, with the xreg parameter.

xreg Optionally, a numerical vector or matrix of external regressors, which must have the same number of rows as y . (It should not be a data frame.)

Here is a toy example

arima <- auto.arima(airquality$Wind, xreg = as.matrix(airquality[-3]))
fc <- forecast(arima, level = c(95), h = 36, xreg = as.matrix(airquality[-3]))

Created on 2019-02-19 by the reprex package (v0.2.1)

1 Like


Thanks Andres for helping me.
Actually, I am facing the issue:
xreg is rank deficient but I should not get rid of columns with zeros because it was obtained after one-hot encoding.

So, I was wondering how to resolve this.
Due to onehot encoding, most columns are zeroes. and hence xreg is rank deficient.
What else I can do ?
There are categorical values influencing the outcome apart from onehot encoding.



This is a different issue than the one in your original question. Please open a new topic with a reprex and sample data relevant for this specific problem.


closed #8

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.