Hi,
I have this simple df
data.frame <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
`Registration Date` = c("2016-04-29",
"2023-05-26","2021-06-09","2021-06-25","2022-06-15",
"2020-06-19","2016-02-23","2011-03-31","2022-06-10",
"2006-09-01","2018-06-29","2020-03-16","2022-05-30"),
`Delivery Date` = c("2016-04-29",
"2023-05-31","2021-06-01","2021-06-30","2022-06-16",
"2020-06-19",NA,NA,"2022-06-11",NA,"2010-09-30",
"2020-03-16","2022-06-30"),
`Call Notes` = c("it does it",
"Sent email yesterday, doesn't work.",
NA,"Customer got new car on order with Beechwood.",
"Booked. 10.05.2023 HP",NA,
"This call was automatically generated as a result of the overnight booking data feed.",NA,"Booked. 10.05.2023 HP",
"Booked. 31.05.2023 HP","CAN APPROACH CUSTOMER ON DAY OF BOOKING.",
"This call was automatically generated as a result of the overnight booking data feed.",
"This call was automatically generated as a result of the overnight booking data feed.")
)
data.frame
where I categorise comments using this code:
library(dplyr)
library(stringr)
data.frame <- rename(data.frame, CallNotes = 'Call Notes')
str(data.frame)
result <- data.frame %>%
mutate(Cat.Blank = if_else(is.na(CallNotes),1,0),
Cat.OvernightBooking = if_else(str_detect(CallNotes, regex("overnight\\sbooking", ignore_case = TRUE, multiline = TRUE)), 1,0),
Cat.Booked = if_else(str_detect(CallNotes, regex("booked|boooked", ignore_case = TRUE, multiline = TRUE)), 1,0),
Cat.DoesNot.DidNot = if_else(str_detect(CallNotes, regex("does", ignore_case = TRUE, multiline = TRUE)), 1,0)) %>%
mutate(All = max(c_across(starts_with("Cat.")), na.rm = T)) %>%
mutate(Cat.Other = case_when(
All ==0 ~ 1,
All > 0 ~ 0))
str(result)
library(dplyr)
result <- select(result, -All)
result <- result %>%
mutate_at(vars(-c(1:3)), ~if_else(is.na(.), 0, .))
str(result)
but I have following issues:
- "Does" and "Doesn't" are categorised the same way as I cannot find a way of including expressions with an apostrophe (')
- My awkward way of creating category called "Other" is not working and is overcomplicated. Basically, every record not blank and not categorised should be coded as "Cat.Other". I would like to use c_across for variables including "Cat."
- In the end, all "Cat." variables should be 0 or 1 and my final code includes vars(-c(1:3)) which should be replaced by c_across for variables including "Cat."
- Finally, I believe everything should work as one coding process instead of my weird three stages.
Can anyone help me to do it proplerly?