Handle date-time without the date using lubridate()

Hello,

I am trying to find the repeated purchases over years.

I have been looking at the lubridate() package but all of the examples deal with the time format with date, month, year (or even more with time) but my data only have month and year.

My goal is to parse the time format into 12 months per each year, so I can find the repeated purchases.

Is there a way to do this?

Thanks!

1 Like

Beyond the built-in shorthand functions (e.g., ymd()), lubridate can parse all sorts of date formats using parse_date_time(). See the link for all the details, but for instance you can do this:

library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

set.seed(42)

# Invent some dates in a month-year format
dates <- paste(
  sample(1:12, 30, replace = TRUE),
  sample(2016:2017, 30, replace = TRUE),
  sep = "-"
)

dates
#>  [1] "11-2017" "12-2017" "4-2016"  "10-2017" "8-2016"  "7-2017"  "9-2016" 
#>  [8] "2-2016"  "8-2017"  "9-2017"  "6-2016"  "9-2016"  "12-2016" "4-2017" 
#> [15] "6-2016"  "12-2017" "12-2017" "2-2017"  "6-2017"  "7-2017"  "11-2016"
#> [22] "2-2016"  "12-2016" "12-2017" "1-2016"  "7-2017"  "5-2017"  "11-2016"
#> [29] "6-2016"  "11-2017"

lubridate::parse_date_time(dates, orders = "mY")
#>  [1] "2017-11-01 UTC" "2017-12-01 UTC" "2016-04-01 UTC" "2017-10-01 UTC"
#>  [5] "2016-08-01 UTC" "2017-07-01 UTC" "2016-09-01 UTC" "2016-02-01 UTC"
#>  [9] "2017-08-01 UTC" "2017-09-01 UTC" "2016-06-01 UTC" "2016-09-01 UTC"
#> [13] "2016-12-01 UTC" "2017-04-01 UTC" "2016-06-01 UTC" "2017-12-01 UTC"
#> [17] "2017-12-01 UTC" "2017-02-01 UTC" "2017-06-01 UTC" "2017-07-01 UTC"
#> [21] "2016-11-01 UTC" "2016-02-01 UTC" "2016-12-01 UTC" "2017-12-01 UTC"
#> [25] "2016-01-01 UTC" "2017-07-01 UTC" "2017-05-01 UTC" "2016-11-01 UTC"
#> [29] "2016-06-01 UTC" "2017-11-01 UTC"

Created on 2018-07-18 by the reprex package (v0.2.0).

A month-year date is going to be represented internally as the first day of the month, but in general this doesn't matter — any time you need to output the date for display, you can format it as just month-year using format() or lubridate::stamp().

6 Likes

jcblum:
Thank you for your time.
I appreciate it.

Jcblum:
My date format on my data set is "Jul-16".
Can I still parse this format?

I’m not able to test right now, but I would guess that you’d use:
lubridate::parse_date_time(dates, orders = "by”)

You can find the underlying date codes here:

https://stat.ethz.ch/R-manual/R-devel/library/base/html/strptime.html

1 Like

Yup! The symbols are also listed on the parse_date_time() documentation page I linked above. That's worth a read because while parse_date_time() uses the strptime() symbols, it's more lenient in its interpretations (on purpose, since its goal is to be a more friendly and flexible parser than the base functions) — the differences are spelled out in the documentation.

The parse_date_time() page is long, but if you're someone who finds themselves reading in date and time data, I think it's well worth studying carefully. It's a fantastically useful tool that can do a lot more than maybe meets the eye.

2 Likes

jcblum:
Thank you. I am going to study that page.