Subtracting individual components of dates (by day, month, and year all together)

lubridate

#1

Is it possible to subtract dates by day and month and year all together in a non-hacky way?

So what I mean is, if I have as_date("1999-09-15") and as_date("2018-09-15"), the difference should be exactly 19.0

(day - day = 0, month - month = 0, year - year = 19).

But with lubridate, I assume dates are subtracted by individual day (which is correct in most cases, but not this one).

EX:

library(lubridate)
library(dplyr)

time_length(as_date("2018-09-18") - as_date("1999-09-18"), "years")
#> [1] 19.0137

Created on 2018-07-01 by the reprex package (v0.2.0).

Any ideas?


#2

This is happening because of the 5 leap days between 1999 and 2018. Years can have two different lengths when denominated in days. I suppose the options depend on your use case. The code below returns the exact number of years, but won't give correct fractional years if the two dates are not on the same day of the year.

d2 = as_date("2018-09-18")
d1 = as_date("1999-09-18")

year(d2) - year(d1)
[1] 19

nleap = sum(leap_year(year(d1):year(d2)))

time_length(d2 - d1 - nleap, "years")
[1] 19

Would keeping track of year difference and the difference in day of the year for each date give you what you need? For example:

d3 = as_date("2018-08-20")
d4 = as_date("2018-10-09")

year_diff = function(date2, date1) {
  year(date2) - year(date1) + (yday(date2) - yday(date1))/365
}

year_diff(d2, d1)
[1] 19

year_diff(d3, d1)
[1] 18.92055

year_diff(d4, d1)
[1] 19.05753

But note that this fails to return an integer year difference if one of the dates (but not both) is in a leap year and after February 28.

d5 = as_date("2020-09-18")

year_diff(d5, d1)

[1] 21.00274

And in general this method will differ by 1/365 when one date is in a leap year and after February 28 and the other is not in a leap year or is in a leap year but on or before February 28 (relative to the case where these issues don't occur). For example, in the code below we want both examples to return a difference of one year, but the second one doesn't because of the leap year issue.

d6 = as_date("2020-02-28")
d7 = as_date("2019-02-28")
d8 = as_date("2020-03-01")
d9 = as_date("2019-03-01")

year_diff(d6, d7)
[1] 1

year_diff(d8, d9)
[1] 1.00274

round(1/365, 5)
[1] 0.00274

Here's some additional logic to check for a leap-year mismatch between the two dates. Keep in mind that in all of these approaches we're removing leap days, so we're not calculating the actual elapsed time between two dates. Instead, we're calculating the elapsed time minus the number of leap days between the two dates.

year_diff = function(date2, date1) {
  
  leap = sum(c(leap_year(date2) & yday(date2) > yday("2018-02-28"), 
               leap_year(date1) & yday(date1) > yday("2018-02-28"))) == 1
  
  year(date2) - year(date1) + (yday(date2) - yday(date1) - ifelse(leap, 1, 0))/365
}

year_diff(d5, d1)
[1] 21

year_diff(d8, d9)
[1] 1

#3

Incredible answer @joels. Thanks so much. Funny how complicated something simple like dates can get


#4

It's only tip of the iceberg :slight_smile:

I've enjoyed reading about this in this write-up

https://zachholman.com/talk/utc-is-enough-for-everyone-right

My personal highlight:

The English-speaking folk were like yo, this definitely sounds like Coordinated Universal Time, boom, ship it. And the French speakers were like yeah that makes total sense! Temps Universel Coordonné DOES work out well in our language, too, ship it! Then they both looked up and realized cool, they’ve created both CUT and TUC for acronyms. :poop: .
Anyway, the compromise that arose was that if everyone is special, no one is special, so they created an entirely new set of letters that has no direct relation to any real words for the compromise: UTC.