Time Series Analysis using an ARIMA Model to Visualise the Mean Absolute Percentage Error (MAPE)

Overview

I am conducting a time series analysis using an ARIMA model, and I would like to visualise the Mean Absolute Percentage Error (MAPE) in a plot (see below)

I am currently following this tutorial:

Issues

  1. I have successfully run the R-code; however, the value produced by the object MAPE (see R-code below) is not a number, showing an Arima Holdout MAPE value of NaN %, which is definitely incorrect:

                                   MAPE
                                 1  NaN
    
  2. The numerical range of the y-axis on the ARIMA MAPE plot (see below) is spurious, ranging from zero to 10,0000 i.e., 0e+00 to 1e+5 (which is obviously incorrect), and, the date range on the x-axis is wrong because the date range for my data is 2015-2017. As a consequence, I have inputted the date range of 2015-2017 into both ts() and window() functions. The existence of this spurious data range is also responsible for the strange spike exhibited by the fitted data in the ARIMA MAPE plot below.

  3. Moreover, the column named 'Date' produced by the object d4 (see R-code below) exhibits the year 1970 (see below), which does not make sense considering this is not the date range I inputted into my models:-

              #These values are produced by the object d4
                     Fitted    Actual        Date
               1  3.592659e+01     36     1970-01-02
               2  2.798055e+01     28     1970-01-03
               3  3.897144e+01     39     1970-01-04
               4  4.596831e+01     46     1970-01-05
               5  5.008207e+00      5     1970-01-06
               6  1.009074e-03      0     1970-01-07
               7  1.007765e-03      0     1970-01-08
               8  2.194326e+01     22     1970-01-09
               9  9.985120e+00     10     1970-01-10
              10  1.497379e+01     15     1970-01-11
              11  7.992608e+00      8     1970-01-12
              12  3.292665e+01     33     1970-01-13
              13  3.320180e+01     33     1970-01-14
              14  2.568558e+01     29     1970-01-15
    

The dates on the x-axis should follow the same format as the column called 'Year (below)', which is produced when you run the BSTS_new_df object.

                 Year     Month        Frequency
         1  2015-01-01   January             36
         2  2015-02-01  February             28
         3  2015-03-01     March             39
         4  2015-04-01     April             46
         5  2015-05-01       May              5

If anyone can please help me solve these issues, I would like to express my deepest appreciation.

Many thanks in advance.

R-code

        #Set seed
         seed(45L)

       ##Open packages for the time series analysis

       library(lubridate)
       library(bsts)
       library(dplyr)
       library(ggplot2)
       library(ggfortify)

        ##Change the Year column into YY/MM/DD format for the first of every month per year
         BSTS_df$Year <- lubridate::ymd(paste0(BSTS_df$Year, BSTS_df$Month,"-01"))

        ##Order the Year column in YY/MM/DD format into the correct sequence: Jan-Dec
         allDates <- seq.Date(
                           min(BSTS_df$Year),
                            max(BSTS_df$Year),
                            "month")

        ##Produce and arrange the new data frame ordered by the first of evey month in YY/MM/DD format
        BSTS_new_df <- BSTS_df %>%
                  right_join(data.frame(Year = allDates), by = c("Year")) %>%
                  dplyr::arrange(Year) %>%
                  tidyr::fill(Frequency, .direction = "down")

        ##Create a time series object
       myts_Arima <- ts(BSTS_new_df$Frequency, start=c(2015, 1), end=c(2017, 12), frequency=12)

       ##Upload the data into the windows() function
       Arima_Window <- window(myts_Arima, start=c(2015, 01), end=c(2017, 12))

      ### Fit the ARIMA model
        arima <- arima(log10(x+.001), 
                     order=c(0, 1, 1), 
                     seasonal=list(order=c(0,1,1), period=12))

        ### Actual versus predicted
          d4 <- data.frame(c(10^as.numeric(fitted(arima)), # fitted and predicted
                      10^as.numeric(predict(arima, n.ahead = 12)$pred)),
                      as.numeric(BSTS_df$Frequency), #actual values
                      as.Date(time(BSTS_df$Year)))

               ##Name the columns
               names(d4) <- c("Fitted", "Actual", "Date")

               ### MAPE (mean absolute percentage error)
                MAPE <- dplyr::filter(d4, lubridate::year(Date)>2017) %>% 
                             dplyr::summarise(MAPE=mean(abs(Actual-Fitted)/Actual))

               ##Open a plotting window
               dev.new()

             ### Plot actual versus predicted
             ggplot(data=d4, aes(x=Date)) +
             geom_line(aes(y=Actual, colour = "Actual"), size=1.2) +
             geom_line(aes(y=Fitted, colour = "Fitted"), size=1.2, linetype=2) +
             theme_bw() + theme(legend.title = element_blank()) + 
             ylab("") + xlab("") 
             geom_vline(xintercept=as.numeric(as.Date("2015-12-01")), linetype=2) +
             ggtitle(paste0("ARIMA -- Holdout MAPE = ", round(100*MAPE,2), "%")) + 
             theme(axis.text.x=element_text(angle = -90, hjust = 0))

ARIMA MAPE Plot

Data frame - BSTS_df

              structure(list(Year = c(2015, 2015, 2015, 2015, 2015, 2015, 2015, 
             2015, 2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016, 2016, 2016, 
             2016, 2016, 2016, 2016, 2016, 2016, 2017, 2017, 2017, 2017, 2017, 
             2017, 2017, 2017, 2017, 2017, 2017, 2017), Month = structure(c(5L, 
             4L, 8L, 1L, 9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 
            9L, 7L, 6L, 2L, 12L, 11L, 10L, 3L, 5L, 4L, 8L, 1L, 9L, 7L, 6L, 
            2L, 12L, 11L, 10L, 3L), .Label = c("April", "August", "December", 
            "February", "January", "July", "June", "March", "May", "November", 
             "October", "September"), class = "factor"), Frequency = c(36, 
               28, 39, 46, 5, 0, 0, 22, 10, 15, 8, 33, 33, 29, 31, 23, 8, 9, 
              7, 40, 41, 41, 30, 30, 44, 37, 41, 42, 20, 0, 7, 27, 35, 27, 
               43, 38)), row.names = c(NA, -36L), class = "data.frame")

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.