Which ARIMA Model to use based on the ACF and PACF visuals of my data using fpp3 package

mlsops · April 13, 2023, 5:48pm

Hello, I am trying to create an arima forecast model using fpp3 package in R. I am trying to use an ARIMA model, it looks like my data has some season component, but hard to tell. Here are the ACF + PACF visuals of the 3 groups - (A, B,C). I am trying to forecast number of clients in each group for the next 1 year and so, I am using the fpp3 package.

So far I have tried the generic ARIMA model and another two I thought would be accurate (arima007 + arima_s) but it wasn't. This is my code:

model <- data_stretch %>%
  model(
    arima = ARIMA(volume),
    arima007 = ARIMA(volume ~ pdq(0,0,7)),
     arima_s = ARIMA(volume ~ pdq(0,0,7)) + PDQ(0,0,0))
  )

Here is what the data looks like for category A when plotted:

if someone could please help me write my code and help me figure out the best pdq and PDQ values based on my ACF and PACF visuals, that would be much appreciated. Thank you

Rootsyl · April 13, 2023, 6:45pm

You could also use auto.arima function if you are not sure. There is not enough data to find good arima models here as we dont know if data is stationary, or has constant variance. You can do hegy test for seasonality. If variance changes over time you would need a garch model to model the variance as well...

mlsops · April 13, 2023, 7:36pm

I ran these tests as well to provide more information, hope this helps too:

> data  %>%
+   features(volume, unitroot_kpss)
# A tibble: 3 × 3
  provider_group          kpss_stat kpss_pvalue
  <chr>                 <dbl>       <dbl>
1 A                     0.774      0.01  
2 B                     0.549      0.0307
3 C                     0.772      0.01  
> data %>%
+   features(volume, unitroot_ndiffs)
# A tibble: 3 × 2
  provider_group          ndiffs
  <chr>              <int>
1  A                     1
2 B                      1
3  C                     1

Rootsyl · April 13, 2023, 7:51pm

You need to take one difference for the model to be stationary. But you need hegy test to see if seasonality exists. I recommend studying time series a little.

mlsops · April 13, 2023, 9:53pm

I did an nsdiffs instead of ndiffs and got this value below, does this mean that my data is seasonal and I need to take 2 difference for the model? I tried to do the hegy method in r using the forecast package but it did not work for me:

> data %>%
+   features(volume, unitroot_nsdiffs)
# A tibble: 3 × 2
  provider_group          nsdiffs
  <chr>               <int>
1 A                       2
2 B                       2
3 C                       2

I have also used the following function in provider group B to see if the sesonal component can be determined:

data %>%
  filter(provider_group =="B") %>%
  gg_tsdisplay(difference(volume, 12) %>% difference(), plot_type = "partial", lag = 24)

and here is what I got:

startz · April 13, 2023, 10:44pm

You have two years of data. Finding out if there is a seasonal component is essentially hopeless, since you only observe a season twice.

mlsops · April 13, 2023, 11:30pm

agreed, but looking at the data, it does look like there might be a seasonal component, am I right?

system · May 25, 2023, 11:30pm

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.