Extract Date From String

I am trying to extract a date from inside a string that contains other text and cannot quite hit on the right regex to do this.

string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'

Any thoughts?

Assuming the date is in the format month abbreviation (capitalized) followed by day, comma, and year, then the following should work.

string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'

stringr::str_extract(string, '[A-Z][a-z]{2}\\s([0-9]|[0-9]{2})[,]\\s[0-9]{4}')
#> [1] "Apr 1, 1999"

Created on 2022-11-07 with reprex v2.0.2.9000

1 Like

I tried to do it with readr::parse_datetime which should pull out a date from a string like that, but I failed. So I did it in 2 parts. Assuming your string is consistently going to have the dash before the name, and the date time will end with am or pm, this should work - taking str_extract from stringr to pull out the raw date, then using lubridate's mdy_hm to convert the date itself. This worked, but could be a little neater, I guess!

new_string <- str_extract(string, " - .* [am|pm]")

result <- mdy_hm(new_string)

string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'

gsub(" [(].*$","",string) |> gsub("^Posted by [A-Za-z ]+ - ","",x = _) |> strptime(x =_,"%b %d, %Y %H:%M")
#> [1] "1999-04-01 11:00:00 PST"

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.