Subtracting individual components of dates (by day, month, and year all together)

Is it possible to subtract dates by day and month and year all together in a non-hacky way?

So what I mean is, if I have as_date("1999-09-15") and as_date("2018-09-15"), the difference should be exactly 19.0

(day - day = 0, month - month = 0, year - year = 19).

But with lubridate, I assume dates are subtracted by individual day (which is correct in most cases, but not this one).



time_length(as_date("2018-09-18") - as_date("1999-09-18"), "years")
#> [1] 19.0137

Created on 2018-07-01 by the reprex package (v0.2.0).

Any ideas?

This is happening because of the 5 leap days between 1999 and 2018. Years can have two different lengths when denominated in days. I suppose the options depend on your use case. The code below returns the exact number of years, but won't give correct fractional years if the two dates are not on the same day of the year.

d2 = as_date("2018-09-18")
d1 = as_date("1999-09-18")

year(d2) - year(d1)
[1] 19

nleap = sum(leap_year(year(d1):year(d2)))

time_length(d2 - d1 - nleap, "years")
[1] 19

Would keeping track of year difference and the difference in day of the year for each date give you what you need? For example:

d3 = as_date("2018-08-20")
d4 = as_date("2018-10-09")

year_diff = function(date2, date1) {
  year(date2) - year(date1) + (yday(date2) - yday(date1))/365

year_diff(d2, d1)
[1] 19

year_diff(d3, d1)
[1] 18.92055

year_diff(d4, d1)
[1] 19.05753

But note that this fails to return an integer year difference if one of the dates (but not both) is in a leap year and after February 28.

d5 = as_date("2020-09-18")

year_diff(d5, d1)

[1] 21.00274

And in general this method will differ by 1/365 when one date is in a leap year and after February 28 and the other is not in a leap year or is in a leap year but on or before February 28 (relative to the case where these issues don't occur). For example, in the code below we want both examples to return a difference of one year, but the second one doesn't because of the leap year issue.

d6 = as_date("2020-02-28")
d7 = as_date("2019-02-28")
d8 = as_date("2020-03-01")
d9 = as_date("2019-03-01")

year_diff(d6, d7)
[1] 1

year_diff(d8, d9)
[1] 1.00274

round(1/365, 5)
[1] 0.00274

Here's some additional logic to check for a leap-year mismatch between the two dates. Keep in mind that in all of these approaches we're removing leap days, so we're not calculating the actual elapsed time between two dates. Instead, we're calculating the elapsed time minus the number of leap days between the two dates.

year_diff = function(date2, date1) {
  leap = sum(c(leap_year(date2) & yday(date2) > yday("2018-02-28"), 
               leap_year(date1) & yday(date1) > yday("2018-02-28"))) == 1
  year(date2) - year(date1) + (yday(date2) - yday(date1) - ifelse(leap, 1, 0))/365

year_diff(d5, d1)
[1] 21

year_diff(d8, d9)
[1] 1

Incredible answer @joels. Thanks so much. Funny how complicated something simple like dates can get

1 Like

It's only tip of the iceberg :slight_smile:

I've enjoyed reading about this in this write-up

My personal highlight:

The English-speaking folk were like yo, this definitely sounds like Coordinated Universal Time, boom, ship it. And the French speakers were like yeah that makes total sense! Temps Universel Coordonné DOES work out well in our language, too, ship it! Then they both looked up and realized cool, they’ve created both CUT and TUC for acronyms. :poop: .
Anyway, the compromise that arose was that if everyone is special, no one is special, so they created an entirely new set of letters that has no direct relation to any real words for the compromise: UTC.


Hey, Joel. I was looking back on this answer, and maybe I missed something, but do you have any idea why this isn't returning exactly 18? One year is a leap year after Feb. 28th, but I assumed that the following function (basically the last one you posted) would take care of that.

Any ideas?


years_difference = function(to, from) {
  leap <- sum(c(leap_year(to) & yday(to) > yday("2018-02-28"), 
               leap_year(from) & yday(from) > yday("2018-02-28"))) == 1
  years_difference <- year(to) - year(from) + (yday(to) - yday(from) - ifelse(leap, 1, 0))/365

years_difference(as_date("2018-09-15"), as_date("2000-09-15"))
#> [1] 17.99452

Created on 2018-08-17 by the reprex package (v0.2.0).

I'm afraid I don't know how to resolve your last question, but I'd point you to the lubridate documentation for a good example of why this problem is so difficult.

See the section entitled "If anyone drove a time machine, they would crash":

Let's say our starting date is 2016-02-29, a leap day. What date is one year later? Presumably either 2017-02-28 or 2017-03-01, right? But either one you pick, you'll end up with two different start dates that are both exactly one year prior to that date. The problem is that a year, like a month, is an ambiguous unit that takes more than one form.

One alternative that is arguably more consistent, but perhaps less satisfying, would be to use tropical years, which are 365.24 days long. Then most "years" are only 99.9% of a tropical year, but leap years are 100.2% of a tropical year. Then it's always wrong, but the error will tend toward zero over time. (Which is also my personal goal. :slight_smile: )

1 Like

Yeah that's fair. Thanks for helping!

Alright everyone, because of some nice Twitter talk, I have a solution that I think should be as perfect as it'll get.

It seems like it's an interval/period issue. Dates -- fun stuff.


get_ages <- function(to, from) {
  to <- lubridate::as_date(to)
  from <- lubridate::as_date(from)
  age <- from%--%to / lubridate::years(1)

get_ages("2018-09-15", "2000-09-15")
#> [1] 18
get_ages("2018-09-15", "1999-09-15")
#> [1] 19
get_ages("2018-09-15", "2000-09-16")
#> [1] 17.99726

Created on 2018-08-17 by the reprex package (v0.2.0).