How to apply ARIMA/time series on multicolumn/variable dataset

AbhishekHP · February 19, 2019, 11:24am

Hi Time Series Experts,

I have 6 columns : x1,x2,x3,x4,x5,y.
x1 is the date column
y is the output
x2,x3,x4 and x5 are different variables which influences y.
So, first 5 columns have impact on y outcome.

I would have used ARIMA is Date, X1 and Y are only in the dataset.
Eg: [http://rstudio-pubs-static.s3.amazonaws.com/311446_08b00d63cc794e158b1f4763eb70d43a.html]
But other variables have influence on the y outcome. So, I dont want to remove them.

Is there a function/way to apply time series considering on 6 columns in the data ?
P.S: Used one-hot encoding to transform the categorical variable

Reprex
We can simple apply auto.arima on AirPassengers dataset available from R packages but what if there are more than 1 variable influencing the output variable.

Let's consider we have airquality dataset in R:
We want to predict Wind
But it has Solar.R, Ozone, Temp which are influencing this output variable: Wind
How can can I use all these variables along with time: Month and Day
in order to predict Wind.
Although, found someone applied this, I could not grasp how they applied in Kaggle.
https://www.kaggle.com/raenish/time-series-on-air-quality/code

Code for forecasting AirPassengers

library(forecast)
# Plot time series data
plot(AirPassengers)
autoplot(AirPassengers) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)", title="Air Passengers from 1949 to 1961") 
# Apply Auto arima
arimaAP <- auto.arima(AirPassengers)
# Forecast next 36 months
forecastAP <- forecast(arimaAP, level = c(95), h = 36)
autoplot(forecastAP)

As we can see the AirPassengers data had only 1 variable dependent on time. Could you please guide how to deal with dataset which has more than 1 variable such as AirQuality data where there are more than 1 variable are influencing the output.

Thanks in advance,
Abi

andresrcs · February 19, 2019, 12:09pm

Could you please turn this into a self-contained REPRoducible EXample (reprex)? A reprex makes it much easier for others to understand your issue and figure out how to help.

If you've never heard of a reprex before, you might want to start by reading this FAQ:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

AbhishekHP · February 19, 2019, 12:46pm

Thanks Andrescs, I updated the question with more reprex and description sharing researched links as well.

andresrcs · February 19, 2019, 12:54pm

Please provide reproducible code, putting the word "reprex" in your text doesn't make your question a reproducible example.

andresrcs · February 19, 2019, 9:14pm

From auto.arima() documentation you can pass multiple exogenous variables in the form of a matrix, with the xreg parameter.

xreg Optionally, a numerical vector or matrix of external regressors, which must have the same number of rows as y . (It should not be a data frame.)

Here is a toy example

library(forecast)
arima <- auto.arima(airquality$Wind, xreg = as.matrix(airquality[-3]))
fc <- forecast(arima, level = c(95), h = 36, xreg = as.matrix(airquality[-3]))
autoplot(fc)

^{Created on 2019-02-19 by the reprex package (v0.2.1)}

AbhishekHP · February 20, 2019, 10:32am

Thanks Andres for helping me.
Actually, I am facing the issue:
xreg is rank deficient but I should not get rid of columns with zeros because it was obtained after one-hot encoding.

So, I was wondering how to resolve this.
Due to onehot encoding, most columns are zeroes. and hence xreg is rank deficient.
What else I can do ?
There are categorical values influencing the outcome apart from onehot encoding.

andresrcs · February 20, 2019, 5:40pm

This is a different issue than the one in your original question. Please open a new topic with a reprex and sample data relevant for this specific problem.

system · March 13, 2019, 5:47pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.