Match the first letter of one column with the first letter of other column and do str_replace after

I have a list of abbreviations and a list of replacements. However, some abbreviations have country index in them. It looks like this:

Country Abbr
US SP num1
UK SP num1
Malasia MSP num2
Albania ASD num1
Poland ASD num3

The list of abbreviations and replacements:
country <- c("US", "UK", "Malasia", "Albania", "Poland")
abbr <- c("SP num1", "SP num1", "MSP num2", "ASD num1", "ASD num3")
words_to_replace <- c("SP", "SD")
replacements<- c("state police", "state duma")

What I would like to get is:

Country New
us state police num1
uk state police num1
malasia malasia state police num2
albania albania state police num1
poland asd num3

For usual abbreviations I use:

df %>%
mutate(new= str_replace_all(abbr, setNames(replacements, words_to_replace)))

However, I need to somehow combine it with substr function and I have no idea how to do it. I assume something with if or ifelse might work.

Below is one way to achieve the desired output.


df = data.frame(
  Country = c('US', 'UK', 'Malasia', 'Albania', 'Poland'),
  Abbr = c('SP num1', 'SP num1', 'MSP num2', 'ASD num1', 'ASD num3')

replacement_data = data.frame(
  New = c("SP", "SD"),
  replacement = c("state police", "state duma")

out = df |>
  mutate(New = ifelse(substr(Country,1,1) == substr(Abbr,1,1),
                       paste(Country, substr(Abbr,2, nchar(Abbr))),
                       )) |>
  separate_rows(New, sep = ' ') |>
  left_join(replacement_data) |>
  mutate(New = ifelse(, New, replacement)) |>
  group_by(Country) |>
  mutate(New = paste(New, collapse = ' ')) |>
  ungroup() |>
  distinct(Country, Abbr, New) |>
  mutate(across(c(Country, New), ~tolower(.)))

#> # A tibble: 5 × 3
#>   Country Abbr     New                      
#>   <chr>   <chr>    <chr>                    
#> 1 us      SP num1  state police num1        
#> 2 uk      SP num1  state police num1        
#> 3 malasia MSP num2 malasia state police num2
#> 4 albania ASD num1 albania state duma num1  
#> 5 poland  ASD num3 asd num3

Created on 2023-03-20 with reprex v2.0.2.9000

1 Like

@scottyd22 's tidyverse approach works. Because I'm a geriatric, it's too hard for me, though. There are too many discrete operations smushed together and too much syntax to remember. So, I'd do it in base

d <- data.frame(country = c("US", "UK", "Malasia", "Albania", "Poland"),
  abbr = c("SP num1", "SP num1", "MSP num2", "ASD num1", "ASD num3"))

# first pass--strip first letter from 3-letter prefix
d$abbr <- gsub("[^SDP]","",d$abbr)
# second pass--expand abbreviation
d$abbr <- gsub("SP","state police",d$abbr)
d$abbr <- gsub("SD","state duma",d$abbr)
# if only some countries are to be prefixed
to_prefix <- which(d$country %in% c("Malasia","Albania"))
d[to_prefix,"abbr"] <- paste(d[to_prefix,"country"],d[to_prefix,"abbr"])

# otherwise, if all countriea
# d$abbr <- paste(d$country,d$abbr)

# the desired output for Poland, asd num3, is inconsistent but could be
# handled similarly

# to lowercase everything
d$country <- tolower(d$country)
d$abbr <- tolower(d$abbr)
#>   country                 abbr
#> 1      us         state police
#> 2      uk         state police
#> 3 malasia malasia state police
#> 4 albania   albania state duma
#> 5  poland           state duma

Hello! Thank you for your answer. Unfortunately, I realised that my data sometimes includes abbreviations in the middle of the string like: "first ASD num1". Do you have any ideas how to work with it?

How many of these do you have ?
if you have 10 I would prefer one way over if you had 1000 where I might prefer another
and if you have 100's could there be more with different 'forms' that you may not have realised yet, and you will come back to ask about those also ?

I'd like to be more informed of the context needing a solution before attempting one.

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.