Trying to replace words based on the first two and last two characters doesn't seem like a reliable method, I think you should consider using string distance metrics like in this example:
library(tidyverse)
library(fuzzyjoin)
clean_db <- tibble(provincia = c("AZUY", "BOLI$BAR", "CAN_AR", "GUY$AS", "PICHI.CHA",
"COTPAXI", "MORON/A SANTIAGO"),
ciudad = c("QUITO", "CUENCA", "GUAYAQUIL", "MANTA", "PORTOVIEJO",
"AZOGUES", "SALINAS"))
Provincia <- tibble(codigo = c(1:17),
descripcion = c("AZUAY",
"BOLIVAR",
"CAÑAR",
"CARCHI",
"CHIMBORAZO",
"COTOPAXI",
"EL ORO",
"ESMERALDAS",
"GALAPAGOS",
"GUAYAS",
"IMBABURA",
"LOJA",
"LOS RIOS",
"MANABI",
"MORONA SANTIAGO",
"NAPO",
"SANTO DOMINGO DE LOS TSACHILAS"))
clean_db %>%
stringdist_left_join(Provincia %>% select(descripcion),
by = c(provincia = "descripcion"),
method = "osa") %>%
mutate(provincia = coalesce(descripcion, provincia)) %>%
select(-descripcion)
#> # A tibble: 7 × 2
#> provincia ciudad
#> <chr> <chr>
#> 1 AZUAY QUITO
#> 2 BOLIVAR CUENCA
#> 3 CAÑAR GUAYAQUIL
#> 4 GUAYAS MANTA
#> 5 PICHI.CHA PORTOVIEJO
#> 6 COTOPAXI AZOGUES
#> 7 MORONA SANTIAGO SALINAS
Created on 2022-03-30 by the reprex package (v2.0.1)
Or, if possible, manually define a vector with equivalences e.g. c('misspelling' = 'correct')
, which would have the most accurate results.
Note: Next time please provide a proper REPRoducible EXample (reprex) illustrating your issue.