I am trying to extract a date from inside a string that contains other text and cannot quite hit on the right regex to do this.
string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'
Any thoughts?
I am trying to extract a date from inside a string that contains other text and cannot quite hit on the right regex to do this.
string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'
Any thoughts?
Assuming the date is in the format month abbreviation (capitalized) followed by day, comma, and year, then the following should work.
string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'
stringr::str_extract(string, '[A-Z][a-z]{2}\\s([0-9]|[0-9]{2})[,]\\s[0-9]{4}')
#> [1] "Apr 1, 1999"
Created on 2022-11-07 with reprex v2.0.2.9000
I tried to do it with readr::parse_datetime which should pull out a date from a string like that, but I failed. So I did it in 2 parts. Assuming your string is consistently going to have the dash before the name, and the date time will end with am or pm, this should work - taking str_extract from stringr to pull out the raw date, then using lubridate's mdy_hm to convert the date itself. This worked, but could be a little neater, I guess!
new_string <- str_extract(string, " - .* [am|pm]")
result <- mdy_hm(new_string)
string <- 'Posted by John Doe - Apr 1, 1999 11:00 am (#15 Total: 100)'
gsub(" [(].*$","",string) |> gsub("^Posted by [A-Za-z ]+ - ","",x = _) |> strptime(x =_,"%b %d, %Y %H:%M")
#> [1] "1999-04-01 11:00:00 PST"
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.