Extraction of some values from a series of strings

Hi everybody,
I need help to automate the extraction of some values from a series of strings.
the string has this pattern:
Message from authorname » Thu 6 May 2010, 21:21

I would like to create a table where for each column there is the authornames and the dates respectively. I am not interested in extracting the day of the week and the time.
suggestions? thanks in advance

You can use base R regex (you could also use stringr of course):

msg <-
  c(
    "Message from authorname1 » Thu 6 May 2010, 21:21",
    "Message from authorname2 » Thu 10 May 2010, 21:21"
  )

msg_rx <-
  regexec(
    "^Message from ([[:alnum:]]+) » [[:alpha:]]{3} ([[:digit:]]+ [[:alpha:]]+ [[:digit:]]+), ",
    msg
  )
msg_extract <- regmatches(msg, msg_rx)
name_date <- t(sapply(msg_extract, function(x) x[2:3]))
name_date
#          X1          X2
#1 authorname1  6 May 2010
#2 authorname2 10 May 2010
2 Likes

regex is powerful but I try to use it as little as possible

library(tidyverse)
library(lubridate)
df <- tibble(rawstring = "Message from authorname » Thu 6 May 2010, 21:21") %>%
  rowwise() %>%
  mutate(
    twohalves = str_split(rawstring, "»"),
    authorname = str_remove(
      string = head(twohalves, 1),
      pattern = "Message from "
    ),
    datetext = head(unlist(str_split(
      string = tail(twohalves, 1),
      pattern = ","
    )), 1),
    date = lubridate::dmy(datetext)
  )
1 Like

Why, performance concerns? You are using regex too, just through stringr.

not performance, more, its a language I don't enjoy reading or reasoning about.
Stringr might be regexing under the hood, but when I'm splitting on explicit strings I find it more readible for myself to understand the mechanics of what im doing. Im sure if I studied and practiced regex more, then I might feel that regex is more legible etc, but I dont have motivation at the moment to dedicate any time to regex, complex string parsing doesn't come up much in my work.

Guys, thanks to your help, I've reached my purpose...

thank you very much, I've saved a lot of time!

Good to hear :slight_smile: Please mark the post that solved your questions

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.