Hi all,
I have some free text data that I'm trying to recategorize. The data arises from health care coordinator contacts with patients and caregivers, which can be phone calls, emails, or text messages. I'm trying to use str_detect()
with two wildcards and am getting a syntax error. Here's a reprex containing dummy data (not actual patient data).
library(tidyverse)
commdf <- tribble(
~case, ~purpose,
1, "set up visit",
2, "left message with client",
3, "Texted about visit",
4, "left voicemail",
5, "communication about appointment",
6, "phone call",
7, "Emailed client",
8, "client called back",
9, "REPORTED CALL TO MANAGER",
10, "texted client"
)
commdf <- commdf %>%
mutate(commtype = case_when(
str_detect(str_to_lower(purpose), "*call*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*spoke*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*message*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*phone*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*discuss*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*reported*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*set up*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*confirm*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*sched*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*communicat*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*voicemail*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*vm*") == TRUE ~ "call",
str_detect(str_to_lower(purpose), "*text*") == TRUE ~ "text",
str_detect(str_to_lower(purpose), "*txt*") == TRUE ~ "text",
str_detect(str_to_lower(purpose), "*email*") == TRUE ~ "email",
str_detect(str_to_lower(purpose), "*e-mail*") == TRUE ~ "email",
TRUE ~ NA
))
#> Error in mutate_impl(.data, dots): Evaluation error: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX).
Created on 2018-10-11 by the reprex package (v0.2.0).
I'm not sure what's causing the error, but I wonder if it's from using two wildcard asterisks. Is this use legitimate? Is there a better way to go about this?
Also, I'd like to condense this code further by using something like c("*call*", "*spoke*", "*message*", ...)
within str_detect()
, but first need to figure out the regex error I'm getting.
Any help you can provide would be appreciated!