extract things between squared brackets and separate them into multiple columns

veda · September 8, 2022, 7:55am

Dear R experts,

I have a data of multiple sentences like "here are your pins:[p-123],[sp123] and [sp124]". I would like to extract strings and numerals between square brackets [....] and separate them in to different columns according to the initials (e.g., 'sp' or 'p-').
here are the example data:
res=data.frame(sms=c('here are your pins:[p-123],[sp123] and [sp124]','here are your pins:[p-128],[p-129] and [sp125]'),p=c('p-123','p-128,p-129'),sp=c('sp124,sp125'))

column 'sms' is the original text and column 'p' and 'sp' are the desired results.
I tried to use str_extract with somethings like "(?<=\().*(?=\))" but I cannot get rid of 'and '. Any suggests to get the desired results are appreciated.

Sincerely,
Veda

scottyd22 · September 8, 2022, 1:24pm

I believe the following gets to your intended outcome.

library(tidyverse)

d = data.frame(
  sms = c('here are your pins:[p-123],[sp123] and [sp124]',
          'here are your pins:[p-128],[p-129] and [sp125]')
  )


out = d %>%
  # extract content between square brackets
  mutate(content = str_extract_all(sms, '(?<=\\[)(.*?)(?=\\])')) %>%
  # create one row for each piece of content identified
  unnest(content) %>%
  # classify each row based on content initials
  mutate(initials = ifelse(substr(content,1,2) == 'p-', 'p', 'sp')) %>%
  # paste and collapse content for each sms/initials group
  group_by(sms, initials) %>%
  mutate(output = paste(content, collapse = ',')) %>%
  ungroup() %>%
  # eliminate the content column and take distinct
  select(-content) %>%
  distinct() %>%
  # widen by pivoting on initials (creates 'p' and 'sp' columns)
  pivot_wider(names_from = initials, values_from = output)

out
#> # A tibble: 2 × 3
#>   sms                                            p           sp         
#>   <chr>                                          <chr>       <chr>      
#> 1 here are your pins:[p-123],[sp123] and [sp124] p-123       sp123,sp124
#> 2 here are your pins:[p-128],[p-129] and [sp125] p-128,p-129 sp125

Created on 2022-09-08 with reprex v2.0.2.9000

system · September 15, 2022, 1:25pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.