Hi,
I hope I have improved my regular expressions skills a bit but I still have issues with some (like brackets).
I have following df with Model names and I would like to recode this list into ModelCat:
source <-
data.frame(
stringsAsFactors = FALSE,
Resp = c("aaa",
"bbb","ccc","ddd","eee","fff","ggg","hhh",
"iii","jjj","kkk","lll","mmm","nnn","ooo","ppp","qqq"),
Model = c("3008",
"3008 (2016)","308","308 (2013)",
"3008 Hybride Diesel","207","3008 Hatchback","Crossland x",
"crossland","corsa","corsa-e","corsa e",
"4007","New c4","c4","corsa 307 electric","crossland diesel hatchback")
)
source
library(dplyr)
result <- source %>%
mutate(ModelCat = case_when(
grepl(x = Model, pattern = '308\\s(2013)', ignore.case = TRUE) ~ '308 (2013)',
grepl(x = Model, pattern = '308', ignore.case = TRUE) ~ '308',
grepl(x = Model, pattern = '2008', ignore.case = TRUE) ~ 'Peugeot 2008',
grepl(x = Model, pattern = '3008\\shybride', ignore.case = TRUE) ~ 'Other',
grepl(x = Model, pattern = '3008\\shatchback', ignore.case = TRUE) ~ 'Other',
grepl(x = Model, pattern = '3008\\s(2016)', ignore.case = TRUE) ~ 'Peugeot 3008 (2016)',
grepl(x = Model, pattern = '3008', ignore.case = TRUE) ~ 'Peugeot 3008',
grepl(x = Model, pattern = 'Corsa-e\\sELECTRIC\\sHATCHBACK', ignore.case = TRUE) ~ 'Vauxhall Corsa E',
grepl(x = Model, pattern = 'Corsa-e', ignore.case = TRUE) ~ 'Vauxhall Corsa E',
grepl(x = Model, pattern = 'Corsa\\sE', ignore.case = TRUE) ~ 'Vauxhall Corsa E',
grepl(x = Model, pattern = 'Corsa', ignore.case = TRUE) ~ 'Vauxhall Corsa',
grepl(x = Model, pattern = 'Nuovo\\sC4|NEUER\\sC4|New\\sC4|Nuevo\\sC4|Nuova\\sC4|NV\\sC4|C4\\sNeu|C4\\sNlle', ignore.case = TRUE) ~ 'New c4',
grepl(x = Model, pattern = 'Crossland\\sHatchback', ignore.case = TRUE) ~ 'Other',
grepl(x = Model, pattern = 'Crossland\\sX\\sHatchback', ignore.case = TRUE) ~ 'Other',
grepl(x = Model, pattern = 'Crossland\\sX', ignore.case = TRUE) ~ 'Other',
grepl(x = Model, pattern = 'Crossland', ignore.case = TRUE) ~ 'Crossland',
TRUE ~ "Other"
))
result
My objective is to:
- keep Peugeot 3008 and Peugeot 3008 (2016) and recode all other versions of 3008 into "Other".
- keep Peugeot 308 and 308 (2013).
- simplify recoding models with "New" in multiple languages (New c4) , electric models (names containing "-e", " E" or "Electric") like Corsa and models containing extra characters (Crossland but not crossland x , crossland hatchback, crossland diesel hatchback).
Can you help please?