Dates before 1970 in Time Series ggplot

ggplot2

#1

Hello! I have the following dates / values with the Period stored in a vector s (which will go into ggplot):

Period Value
Jan-63 42
Feb-63 35
Mar-63 44
Apr-63 52
May-05 58
Jun-11 48

The -63 refers to 1963 and -05 and -11 is 2005 and 2011 respectively. When I use the following function:

as.Date(format(as.Date(s,format="%m-%y"), "19%y%m%d"), "%Y%m%d")

I get 2063 for all the dates that should be 1963. From what I gather, R starts in 1970. I would like to modify this script to take into account dates I have in the 1900s and 2000s.

Would you happen to have any suggestions? Thank you!

Greg


#2

Hi,

Thank you for that question actually! Thanks to that I've learn that the global variable LC_ALL was important for parsing month :wink:

It seems that you paste the wrong data or the wrong line of code because you specified a format of %m but you have %b

If i'm trying to replicate your line of code

per <- c("Jan-63", "Feb-63", "Mar-63", "Apr-63", "May-05", "Jun-11")

as.Date(per[1], format = "%b-%y") # return NA for me
as.Date(zoo::as.yearmon(per[1], format = "%b-%y")) # return 2063 Jan
as.Date(paste0("01-", per[1]),  format = "%d-%b-%y") # return 2063-01-01

So one line return all in 1900 and the other all in 2000...

as.Date(format(as.Date(paste0("01-", per), format = "%d-%b-%y"), "19%y-%m-%d")) # 1
as.Date(paste0("01-", per), format = "%d-%b-%y") # 2

So you can't really apply the line of code you provide as example. I'm sorry to say that I haven't figure out a better solution than just reconstruct yourself the time period.

If someone comes up with a "real" solution I would be interested as well!
Sorry again


#3

This is not common to find a parsing function that could deal with both at the same time and guess which one is 20-- and which is 19--.
By luck, lubridate :package: has a function that supports this : parse_data_time2 and fast_strptime, which use a C parser, have a cutoff_2000 argument that allows to precise when to change to 2000 and when to stay with 1900

You can use a cutoff below 63: that way two digits number below 63 will be understand as 20th century, and all above as 19th century.

# your data
per <- c("Jan-63", "Feb-63", "Mar-63", "Apr-63", "May-05", "Jun-11")

# parse_date_time2 uses lubridate's abbreviate order
lubridate::parse_date_time2(per, "my", cutoff_2000 = 62L)  
#> [1] "1963-01-01 UTC" "1963-02-01 UTC" "1963-03-01 UTC" "1963-04-01 UTC"
#> [5] "2005-05-01 UTC" "2011-06-01 UTC"
lubridate::parse_date_time2(per, "m-y", cutoff_2000 = 62L)  
#> [1] "1963-01-01 UTC" "1963-02-01 UTC" "1963-03-01 UTC" "1963-04-01 UTC"
#> [5] "2005-05-01 UTC" "2011-06-01 UTC"

# fast_strptime needs full format
# use %b
lubridate::fast_strptime(per, "%b-%y", cutoff_2000 = 62L)    
#> [1] "1963-01-01 UTC" "1963-02-01 UTC" "1963-03-01 UTC" "1963-04-01 UTC"
#> [5] "2005-05-01 UTC" "2011-06-01 UTC"

# %m also work
lubridate::fast_strptime(per, "%m-%y", cutoff_2000 = 62L)    
#> [1] "1963-01-01 UTC" "1963-02-01 UTC" "1963-03-01 UTC" "1963-04-01 UTC"
#> [5] "2005-05-01 UTC" "2011-06-01 UTC"

Created on 2019-01-12 by the reprex package (v0.2.1)

See documentation for those functions
https://www.rdocumentation.org/packages/lubridate/versions/1.7.4/topics/parse_date_time