Hi Time Series Experts,
I have 6 columns : x1,x2,x3,x4,x5,y.
x1 is the date column
y is the output
x2,x3,x4 and x5 are different variables which influences y.
So, first 5 columns have impact on y outcome.
I would have used ARIMA is Date, X1 and Y are only in the dataset.
Eg: [http://rstudio-pubs-static.s3.amazonaws.com/311446_08b00d63cc794e158b1f4763eb70d43a.html]
But other variables have influence on the y outcome. So, I dont want to remove them.
Is there a function/way to apply time series considering on 6 columns in the data ?
P.S: Used one-hot encoding to transform the categorical variable
Reprex
We can simple apply auto.arima on AirPassengers dataset available from R packages but what if there are more than 1 variable influencing the output variable.
Let's consider we have airquality dataset in R:
We want to predict Wind
But it has Solar.R, Ozone, Temp which are influencing this output variable: Wind
How can can I use all these variables along with time: Month and Day
in order to predict Wind.
Although, found someone applied this, I could not grasp how they applied in Kaggle.
https://www.kaggle.com/raenish/time-series-on-air-quality/code
Code for forecasting AirPassengers
library(forecast)
# Plot time series data
plot(AirPassengers)
autoplot(AirPassengers) + geom_smooth(method="lm")+ labs(x ="Date", y = "Passenger numbers (1000's)", title="Air Passengers from 1949 to 1961")
# Apply Auto arima
arimaAP <- auto.arima(AirPassengers)
# Forecast next 36 months
forecastAP <- forecast(arimaAP, level = c(95), h = 36)
autoplot(forecastAP)
As we can see the AirPassengers data had only 1 variable dependent on time. Could you please guide how to deal with dataset which has more than 1 variable such as AirQuality data where there are more than 1 variable are influencing the output.
Thanks in advance,
Abi