Match the first letter of one column with the first letter of other column and do str_replace after

gocoud · March 20, 2023, 11:43am

I have a list of abbreviations and a list of replacements. However, some abbreviations have country index in them. It looks like this:

Country	Abbr
US	SP num1
UK	SP num1
Malasia	MSP num2
Albania	ASD num1
Poland	ASD num3

The list of abbreviations and replacements:
country <- c("US", "UK", "Malasia", "Albania", "Poland")
abbr <- c("SP num1", "SP num1", "MSP num2", "ASD num1", "ASD num3")
words_to_replace <- c("SP", "SD")
replacements<- c("state police", "state duma")

What I would like to get is:

Country	New
us	state police num1
uk	state police num1
malasia	malasia state police num2
albania	albania state police num1
poland	asd num3

For usual abbreviations I use:

df %>%
mutate(new= str_replace_all(abbr, setNames(replacements, words_to_replace)))

However, I need to somehow combine it with substr function and I have no idea how to do it. I assume something with if or ifelse might work.

scottyd22 · March 20, 2023, 6:33pm

Below is one way to achieve the desired output.

library(tidyverse)

df = data.frame(
  Country = c('US', 'UK', 'Malasia', 'Albania', 'Poland'),
  Abbr = c('SP num1', 'SP num1', 'MSP num2', 'ASD num1', 'ASD num3')
)

replacement_data = data.frame(
  New = c("SP", "SD"),
  replacement = c("state police", "state duma")
  )

out = df |>
  mutate(New = ifelse(substr(Country,1,1) == substr(Abbr,1,1),
                       paste(Country, substr(Abbr,2, nchar(Abbr))),
                       Abbr
                       )) |>
  separate_rows(New, sep = ' ') |>
  left_join(replacement_data) |>
  mutate(New = ifelse(is.na(replacement), New, replacement)) |>
  group_by(Country) |>
  mutate(New = paste(New, collapse = ' ')) |>
  ungroup() |>
  distinct(Country, Abbr, New) |>
  mutate(across(c(Country, New), ~tolower(.)))

out
#> # A tibble: 5 × 3
#>   Country Abbr     New                      
#>   <chr>   <chr>    <chr>                    
#> 1 us      SP num1  state police num1        
#> 2 uk      SP num1  state police num1        
#> 3 malasia MSP num2 malasia state police num2
#> 4 albania ASD num1 albania state duma num1  
#> 5 poland  ASD num3 asd num3

Created on 2023-03-20 with reprex v2.0.2.9000

technocrat · March 21, 2023, 9:12am

@scottyd22 's tidyverse approach works. Because I'm a geriatric, it's too hard for me, though. There are too many discrete operations smushed together and too much syntax to remember. So, I'd do it in base

d <- data.frame(country = c("US", "UK", "Malasia", "Albania", "Poland"),
  abbr = c("SP num1", "SP num1", "MSP num2", "ASD num1", "ASD num3"))

# first pass--strip first letter from 3-letter prefix
d$abbr <- gsub("[^SDP]","",d$abbr)
# second pass--expand abbreviation
d$abbr <- gsub("SP","state police",d$abbr)
d$abbr <- gsub("SD","state duma",d$abbr)
# if only some countries are to be prefixed
to_prefix <- which(d$country %in% c("Malasia","Albania"))
d[to_prefix,"abbr"] <- paste(d[to_prefix,"country"],d[to_prefix,"abbr"])

# otherwise, if all countriea
# d$abbr <- paste(d$country,d$abbr)

# the desired output for Poland, asd num3, is inconsistent but could be
# handled similarly

# to lowercase everything
d$country <- tolower(d$country)
d$abbr <- tolower(d$abbr)
d
#>   country                 abbr
#> 1      us         state police
#> 2      uk         state police
#> 3 malasia malasia state police
#> 4 albania   albania state duma
#> 5  poland           state duma

gocoyd · March 22, 2023, 8:54am

Hello! Thank you for your answer. Unfortunately, I realised that my data sometimes includes abbreviations in the middle of the string like: "first ASD num1". Do you have any ideas how to work with it?

nirgrahamuk · March 22, 2023, 10:37am

How many of these do you have ?
if you have 10 I would prefer one way over if you had 1000 where I might prefer another
and if you have 100's could there be more with different 'forms' that you may not have realised yet, and you will come back to ask about those also ?

I'd like to be more informed of the context needing a solution before attempting one.

system · April 12, 2023, 10:37am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.