NA forecast: ARIMA on Box-Cox transformed data

I get NA forecast when I apply ARIMA() on Box-Cox transformed data (lamda = -0.9999242).
However, when I use log transformation, I no longer get NA forecast. What should I do in this case?

library(fpp3)
#                                       DATA
v1 <- c(32, 30, 39, 34, 31, 31, 30.5, 34, 28, 34, 35, 35, 30.5, 31, 
  27, 33.5, 35, 28, 34, 35, 34, 39, 36, 33.5, 36, 33, 31, 39, 34.5, 
  34, 32.5, 30, 27.5, 27, 39.5, 38, 32.5, 34, 43, 34, 32, 43, 36, 
  41, 35.5, 39, 44, 42.5, 34, 36, 49, 35, 44, 36, 42, 40.5, 38.5, 
  33, 36, 33, 36.5, 43, 32, 35, 38.5, 42, 31, 43, 32.5, 34, 35.5, 
  35, 33, 29, 35, 42, 37, 39, 45, 36, 52, 38, 36, 41.5, 43, 31.5, 
  37, 47, 38, 50, 51, 41, 32, 40.5, 37, 39.5, 36, 36.5, 38.5, 38, 
  47.5, 39, 37, 34, 32, 36, 35, 41, 41, 39.5, 44, 44, 65.5, 38, 
  45, 34, 35, 32, 62, 54.5)

# tsibble
tibble(
  year = rep(2008:2017, each = 12),
  m = month(rep(1:12, times = 10), label = TRUE),
  toy_variable = v1,
  month = yearmonth(paste(year, m)),
  index = month
  )%>% 
  select(month, toy_variable) %>% 
  as_tsibble(index = month) -> toy_data

# lambda
lambda_toy <- toy_data %>%
  features(toy_variable, features = guerrero) %>%
  pull(lambda_guerrero)
#########################################
# Auto ARIMA: lambda = -0.9999242
#####################################
toy_data %>%
  model(ARIMA(box_cox(toy_variable, lambda_toy))) %>% 
  forecast() 
#> # A fable: 24 x 4 [1M]
#> # Key:     .model [1]
#>    .model                                      month        toy_variable .mean
#>    <chr>                                       <mth>              <dist> <dbl>
#>  1 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Jan t(N(0.98, 1.4e-05))    NA
#>  2 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Feb t(N(0.98, 1.4e-05))    NA
#>  3 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Mar t(N(0.98, 1.4e-05))    NA
#>  4 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Apr t(N(0.98, 1.4e-05))    NA
#>  5 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 May t(N(0.98, 1.5e-05))    NA
#>  6 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Jun t(N(0.98, 1.5e-05))    NA
#>  7 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Jul t(N(0.98, 1.5e-05))    NA
#>  8 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Aug t(N(0.98, 1.5e-05))    NA
#>  9 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Sep t(N(0.98, 1.6e-05))    NA
#> 10 ARIMA(box_cox(toy_variable, lambda_toy)) 2018 Oct t(N(0.98, 1.6e-05))    NA
#> # ... with 14 more rows

######################################
# Auto ARIMA log-transformed data
#####################################
toy_data %>%
  model(ARIMA(log(toy_variable))) %>% 
  forecast()
#> # A fable: 24 x 4 [1M]
#> # Key:     .model [1]
#>    .model                      month     toy_variable .mean
#>    <chr>                       <mth>           <dist> <dbl>
#>  1 ARIMA(log(toy_variable)) 2018 Jan  t(N(3.8, 0.02))  43.3
#>  2 ARIMA(log(toy_variable)) 2018 Feb t(N(3.8, 0.021))  43.4
#>  3 ARIMA(log(toy_variable)) 2018 Mar t(N(3.8, 0.021))  43.4
#>  4 ARIMA(log(toy_variable)) 2018 Apr t(N(3.8, 0.022))  43.4
#>  5 ARIMA(log(toy_variable)) 2018 May t(N(3.8, 0.022))  43.4
#>  6 ARIMA(log(toy_variable)) 2018 Jun t(N(3.8, 0.022))  43.4
#>  7 ARIMA(log(toy_variable)) 2018 Jul t(N(3.8, 0.023))  43.4
#>  8 ARIMA(log(toy_variable)) 2018 Aug t(N(3.8, 0.023))  43.4
#>  9 ARIMA(log(toy_variable)) 2018 Sep t(N(3.8, 0.023))  43.4
#> 10 ARIMA(log(toy_variable)) 2018 Oct t(N(3.8, 0.024))  43.4
#> # ... with 14 more rows

Created on 2020-08-03 by the reprex package (v0.3.0)

A Box-Cox transformation with lambda = -0.9999242 is an extremely strong transformation, and is most likely the result of an issue with automatically selecting the parameter. From ?guerrero you can see it will choose a transformation parameter lambda between -1 and 2.

This very extreme transformation is skewing the forecast distribution by so much that the numDeriv::hessian() of your transformation is failing (giving NA). I've now modified the parameters here to give a better result here, but the point forecasts you obtain will not be good for this data: https://github.com/mitchelloharawild/distributional/commit/98f8fb78a93d89df9f5249a1b8160fc3060e3c8b

I suggest you look at your data and decide whether or not a transformation is required, and see what using lambda close to -1 does to your data.

Thanks @mitchelloharawild! Beginner's question: If lambda happens to be near -1 or 2, does it mean that transformation is not required ?

When lambda is close to 1, a transformation is probably not required. A Box-Cox transformation is useful for regularising variance over time, where the variance changes monotonically with the level of the time series.

When lambda is near -1 or 2, this suggests that your data is not well suited to a Box-Cox transformation as the optimisation algorithm is failing to get a well behaved transformation parameter. Perhaps the variance of your data does not change monotonically with the level of the series.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.