Convert to monthly time series and draw time series in ggplot2

Hello,

I cleaned my data set and want to do monthly forecasting.
As shown, I have the year and the month and the corresponding total volume for this year.month
I converted my data to time series using ts, but I want to use ggplot2 to show the total.a of each month and year on a time series plot. I used autoplot before, but autoplot shows the labels as integers. This is not good enough. I appreciate your help.

# Libraries 
library(tidyverse)
library(ggplot2)
library(forecast)
# Data 
year.month.r <- c('2016-10', '2016-11','2016-12','2017-01','2017-02',
                  '2017-03','2017-04','2017-05','2017-06','2017-07',
                  '2017-08','2017-09','2017-10','2017-11','2017-12',
                  '2018-01','2018-02')

total.a <- c(625,651,653,709, 741,
              793,639,741,786,652,
             812,632,797,724,643,
             797,755)                  

df.r <- data.frame(year.month.r, total.a)

df.r

# Convert df to time series object starting with 2016-10 and ending in 2017-02 #### 
# I am starting at '2016-10' and ending at '2018-02' and the data is monthy, so frequency = 12.

df.ts.r <- ts(df, start = as.Date('2016-10'), end = as.Date('2018-02'), frequency = 12)  # Frequency is 12 so data is monthly
df.ts.r


# Use ggplot to draw time series with monthly labels ##
sc <- scale_x_date(
  limits = as.Date(c('2016-10','2018-02')),
  date_labels = '%b %y',
  date_minor_breaks = '1 month')

I get an this error:

Error in charToDate(x) :
character string is not in a standard unambiguous format

The character string '2016-10' cannot be converted to Date format in a unique way. Even if you force the format to "%Y-%m" with as.Date('2016-10', format = "%Y-%m"), the character string is stil ambiguous (which day of the month you refer to?).

I would suggest the following solution

year.month.r <- seq.Date(as.Date("2016-10-01"), as.Date("2018-02-01"), by = "month")
df.r <- data.frame(year.month.r, total.a)

With this, you have your monthly (by = "month") time series defined and you can get it plotted with ggplot

sc <- scale_x_date(
  limits = range(df.r$year.month.r),
  date_labels = '%b %y',
  date_minor_breaks = '1 month')

ggplot(df.r, aes(year.month.r, total.a)) +
  geom_line() + sc

A second problem is related to convert df.r to a ts time series object. In creating an object of this class, year and month must be expressed as a vector (Forecasting: Principles and Practice) c(Year, Month):

df.ts.r <- ts(df.r$total.a, start = c(2016, 1), end = c(2018, 2), frequency = 12)  # Frequency is 12 so data is monthly
df.ts.r

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2016 625 651 653 709 741 793 639 741 786 652 812 632
2017 797 724 643 797 755 625 651 653 709 741 793 639
2018 741 786 

Now, autoplot will work and print labels in Years

autoplot(df.ts.r)

Hope it helps.
Have a wonderful day

An alternative to using ts objects is to use a tsibble like this.

library(tidyverse)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following object is masked from 'package:dplyr':
#> 
#>     id
library(feasts)
#> Loading required package: fabletools

df.r <- tsibble(
  month = yearmonth(seq.Date(as.Date("2016-10-01"), as.Date("2018-02-01"), by = "month")),
  total.a = c(625, 651, 653, 709, 741, 793, 639, 741, 786, 652, 812, 632, 797, 724, 643, 797, 755),
  index = month
)
df.r
#> # A tsibble: 17 x 2 [1M]
#>       month total.a
#>       <mth>   <dbl>
#>  1 2016 Oct     625
#>  2 2016 Nov     651
#>  3 2016 Dec     653
#>  4 2017 Jan     709
#>  5 2017 Feb     741
#>  6 2017 Mar     793
#>  7 2017 Apr     639
#>  8 2017 May     741
#>  9 2017 Jun     786
#> 10 2017 Jul     652
#> 11 2017 Aug     812
#> 12 2017 Sep     632
#> 13 2017 Oct     797
#> 14 2017 Nov     724
#> 15 2017 Dec     643
#> 16 2018 Jan     797
#> 17 2018 Feb     755
df.r %>% autoplot(total.a) +
  scale_x_date(
    date_labels = "%b %y",
    date_minor_breaks = "1 month"
  )

Created on 2020-04-26 by the reprex package (v0.3.0)

@pep1024
@robjhyndman

I appreciate you guys for trying to help.

If we can back up a little bit, I would appreciate it.
My data looks something like this:


# Original Data Set 
status.r <- rep('Completed',6)
audit_date.r <- c(rep('2016-10-06',4), rep('2016-11-10',2))

df.2 <- data.frame(status.r, audit_date.r)

df.2

# Convert to monthly data #### 

df.3 <- df.2 %>%
  mutate(year.month.r = substr(audit_date.r, 0,7)) %>%
  group_by(year.month.r) %>%
  summarize(#total.units = sum(units),
    total.audits.r = sum(status.r == 'Completed'))

df.3

Then, after that I tried to convert to time series using:


df.ts.r <- ts(df.3, start = as.Date('2016-10'), 
              end = as.Date('2018-02'),
              frequency = 12)  # Frequency is 12 so data is monthly
df.ts.r

I understand how you used seq.Date() but don't understand how to apply it to my data set.
Can I convert df.2 to monthly data and count the 'Completed' for each month? If yes, how?

Thank you!

library(tidyverse)
library(tsibble)

df.2 <- tibble(
   status.r = rep('Completed',6),
   audit_date.r = as.Date(c(rep('2016-10-06',4), rep('2016-11-10',2)))
  )
df.2
#> # A tibble: 6 x 2
#>   status.r  audit_date.r
#>   <chr>     <date>      
#> 1 Completed 2016-10-06  
#> 2 Completed 2016-10-06  
#> 3 Completed 2016-10-06  
#> 4 Completed 2016-10-06  
#> 5 Completed 2016-11-10  
#> 6 Completed 2016-11-10

# Convert to monthly data 
df.3 <- df.2 %>%
  mutate(year.month.r = yearmonth(audit_date.r)) %>%
  group_by(year.month.r) %>%
  summarize(
    total.audits.r = sum(status.r == 'Completed')
  )
df.3
#> # A tibble: 2 x 2
#>   year.month.r total.audits.r
#>          <mth>          <int>
#> 1     2016 Oct              4
#> 2     2016 Nov              2

df.ts.r <- df.3 %>%
  as_tsibble(index=year.month.r)
df.ts.r
#> # A tsibble: 2 x 2 [1M]
#>   year.month.r total.audits.r
#>          <mth>          <int>
#> 1     2016 Oct              4
#> 2     2016 Nov              2

Created on 2020-04-28 by the reprex package (v0.3.0)

2 Likes

Thank you robjhyndman. This works great!

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.