How to apply ARIMA/time series on multicolumn/variable dataset

Hi Time Series Experts,

I have 6 columns : x1,x2,x3,x4,x5,y.
x1 is the date column
y is the output
x2,x3,x4 and x5 are different variables which influences y.
So, first 5 columns have impact on y outcome.

I would have used ARIMA is Date, X1 and Y are only in the dataset.
Eg: [http://rstudio-pubs-static.s3.amazonaws.com/311446_08b00d63cc794e158b1f4763eb70d43a.html]
But other variables have influence on the y outcome. So, I dont want to remove them.

Is there a function/way to apply time series considering on 6 columns in the data ?
P.S: Used one-hot encoding to transform the categorical variable

Reprex
We can simple apply auto.arima on AirPassengers dataset available from R packages but what if there are more than 1 variable influencing the output variable.

Let's consider we have airquality dataset in R:
We want to predict Wind
But it has Solar.R, Ozone, Temp which are influencing this output variable: Wind
How can can I use all these variables along with time: Month and Day
in order to predict Wind.
Although, found someone applied this, I could not grasp how they applied in Kaggle.
https://www.kaggle.com/raenish/time-series-on-air-quality/code

Code for forecasting AirPassengers

library(forecast)
# Plot time series data
plot(AirPassengers)
autoplot(AirPassengers) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)", title="Air Passengers from 1949 to 1961") 
# Apply Auto arima
arimaAP <- auto.arima(AirPassengers)
# Forecast next 36 months
forecastAP <- forecast(arimaAP, level = c(95), h = 36)
autoplot(forecastAP)

As we can see the AirPassengers data had only 1 variable dependent on time. Could you please guide how to deal with dataset which has more than 1 variable such as AirQuality data where there are more than 1 variable are influencing the output.

Thanks in advance,
Abi

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

Thanks Andrescs, I updated the question with more reprex and description sharing researched links as well.

Please provide reproducible code, putting the word "reprex" in your text doesn't make your question a reproducible example.

From auto.arima() documentation you can pass multiple exogenous variables in the form of a matrix, with the xreg parameter.

xreg Optionally, a numerical vector or matrix of external regressors, which must have the same number of rows as y . (It should not be a data frame.)

Here is a toy example

library(forecast)
arima <- auto.arima(airquality$Wind, xreg = as.matrix(airquality[-3]))
fc <- forecast(arima, level = c(95), h = 36, xreg = as.matrix(airquality[-3]))
autoplot(fc)

Created on 2019-02-19 by the reprex package (v0.2.1)

2 Likes

Thanks Andres for helping me.
Actually, I am facing the issue:
xreg is rank deficient but I should not get rid of columns with zeros because it was obtained after one-hot encoding.

So, I was wondering how to resolve this.
Due to onehot encoding, most columns are zeroes. and hence xreg is rank deficient.
What else I can do ?
There are categorical values influencing the outcome apart from onehot encoding.

This is a different issue than the one in your original question. Please open a new topic with a reprex and sample data relevant for this specific problem.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.