Hi,
I have this simple data frame:
df <- data.frame(stringsAsFactors=FALSE,
URN = c("aaa", "bbb", "ccc", "ddd", "eee", "fff", "ggg", "hhh", "iii", "jjj","kkk","lll"),
all_comment = c("I trust", "untrusting", NA, "not trusty", "trustworthy", "he is not honest", "dishonest person", "reliable guy", "he is unreliable", "like it","doesn't like","unlikely")
)
df
Now I am trying to flag all sentences with "trust" or its synonyms but I must exclude phrases with a negative meaning of trust (so with prefixes like "dis", "un", "not "). I have done this:
library(dplyr)
library(stringr)
TRUST.RESULT <- df %>%
mutate(
TMC.TRUST = if_else(str_detect(all_comment, regex("trust|
trusting|
trustworthy|
trust-worthy|
trusty|
confident|
confidence|
honest|
honesty|
reliable|
reliability|
safe|
safety|
secure|
security|
assured|
care|
careful|
dependable|
sure|
integrity|
genuine|
professional|
profesional|
proffessional|
proffesional", ignore_case = TRUE, multiline = TRUE))
&!str_detect(all_comment, regex("untrust|
untrusting|
untrustworthy|
untrust-worthy|
untrusty|
unconfident|
unconfidence|
unhonest|
unhonesty|
unreliable|
unreliability|
unsafe|
unsafety|
unsecure|
unsecurity|
unassured|
uncare|
uncareful|
undependable|
unsure|
unintegrity|
ungenuine|
unprofessional|
unprofesional|
unproffessional|
unproffesional|
distrust|
distrusting|
distrustworthy|
distrust-worthy|
distrusty|
disconfident|
disconfidence|
dishonest|
dishonesty|
disreliable|
disreliability|
dissafe|
dissafety|
dissecure|
dissecurity|
disassured|
discare|
discareful|
disdependable|
dissure|
disintegrity|
disgenuine|
disprofessional|
disprofesional|
disproffessional|
disproffesional|
not//strust|
not//strusting|
not//strustworthy|
not//strust-worthy|
not//strusty|
not//sconfident|
not//sconfidence|
not//shonest|
not//shonesty|
not//sreliable|
not//sreliability|
not//ssafe|
not//ssafety|
not//ssecure|
not//ssecurity|
not//sassured|
not//scare|
not//scareful|
not//sdependable|
not//ssure|
not//sintegrity|
not//sgenuine|
not//sprofessional|
not//sprofesional|
not//sproffessional|
not//sproffesional", ignore_case = TRUE)), 1, 0),
TMC.LIKE = if_else(str_detect(all_comment, regex("Like", ignore_case = TRUE, multiline = TRUE))
&!str_detect(all_comment, regex("dislike|
unlikely", ignore_case = TRUE)), 1, 0)
)
TRUST.RESULT
but I am sure there is a way of replacing
&!str_detect(all_comment, regex
by something else (not case sensitive).
Also, I don't know why "reliable guy" is not picked up (respondent hhh) but "likely" (respondent lll) is.
Can you help please?