Hi, I have following problem.
I have this simple data frame
library(tidyverse)
sample_data <- data.frame(stringsAsFactors=FALSE,
InterviewID = c(94, 59, 100, 86, 60, 101, 61, 7, 23, 8),
all_comment = c("None", "geen speciale dvdv", "xxxxx.", "None of products",
"geen speciale", "none", "God!!!!", "aa", "special", "perfect"),
ModelLong = c("A", "A", "A", "B", "B", "B", "c", "c", "A", "B"))
Now I am creating some new categories based on comments (thank you siddharthprabhu):
blank_statements <- c("none", "geen\\sspeciale", "commentaar", "neen")
exact_match <- regex(str_c("^", blank_statements, "$", collapse = "|"), ignore_case = TRUE)
partial_match <- regex(str_c(blank_statements, collapse = "|"), ignore_case = TRUE)
sample_data %>%
mutate(TMC.Blank = case_when(is.na(all_comment) ~ 1,
str_length(all_comment) < 4 ~ 1,
str_detect(all_comment, exact_match) ~ 1,
str_length(all_comment) < 10 & str_detect(all_comment, partial_match) ~ 1,
str_length(all_comment) < 10 & str_detect(all_comment, "([a-zA-Z])\\1{2,}") ~ 1),
TMC.Special = if_else(str_detect(all_comment, regex("special", ignore_case = TRUE, multiline = TRUE)), 1, 0),
TMC.Perfect = if_else(str_detect(all_comment, regex("perfect", ignore_case = TRUE, multiline = TRUE)), 1, 0)
) %>%
mutate(TMC.Other = ifelse(test = (rowSums(x = .[-(1:3)]) == 0),
yes = 1,
no = 0)) %>%
mutate_at(vars(-c(1:3)), ~if_else(is.na(.), 0, .))
My problem is that TMC.Other is incorrect as values in rows 4 and 7 should be 1. I have 0s instead.
What am I doing wrong?
Also. The sample file is only an example. In my real data, number of variables varies. Is it possible to change this part of the code:
(rowSums(x = .[-(1:3)]) == 0)
by something reverse (so summing variables from the right - in this case I created 3 new variables so it should sum up just them.
Or even better (more universal): Sum values from all variables starting from TMC to create TMC.Other
Can you help?