Hi,
I have this simple df with comments:
source <- data.frame(
stringsAsFactors = FALSE,
URN = c("aaa","bbb","ccc",
"ddd","eee","fff","ggg","hhh"),
Name = c("xxx","xxx","yyy",
"yyy","yyy","zzzz","abcde","zzzz"),
AComm1 = c("None.",NA,
"No comments related to this exercise","Na",
"N/A","Interesting comment","abc", "whatever is fine"),
AComm2 = c("Nothing",
"I have nothing in common","NA",NA,
"Another comment","....?","xxxx", "All fine"),
BComm1 = c("Service","All good",
"aa","I don't know",
"The final comment about that","Nothing.","na","Everything"),
BComm2 = c("aaa","Nothing",
"None","My final comments are ok", "I don't know",
"Nothing.","Another comment","really"),
Q4 = c(2019,2020,2020,2019,
2020,2021,2021,2019)
)
I managed to achieve something like this (with your help )
library(dplyr)
library(stringr)
library(tidyr)
blank_statements <- regex("^(None.?|No\\scomments?.?|N.?A|Nothing.?)$", ignore_case = TRUE)
merged.comments <- source %>%
mutate_if(~is.character(.) & any(nchar(.) > 15, na.rm = TRUE),
~str_remove_all(.x, blank_statements))%>%
mutate_if(~is.character(.) & any(nchar(.) > 15, na.rm = TRUE),
~str_remove_all(.x, "^.{1,5}$"))%>%
unite("all_comments", where(~is.character(.x) & any(nchar(.x) > 15)), sep = "/", remove = FALSE, na.rm = FALSE)%>% # adjust na.rm argument as needed
mutate(all_comments = str_remove_all(all_comments, "NA"), # Removes NAs
all_comments = str_remove_all(all_comments, "[:cntrl:]"), # Removes control characters like /n/r
all_comments = str_replace_all(all_comments, "\\s\\s+", " "), #Removes duplicated /
all_comments = str_replace_all(all_comments, "//+", "/"), # Removes extra spaces
all_comments = str_remove (all_comments, "/$"), # Removes / in the end
all_comments = str_remove (all_comments, "^/")) # Removes / in the beginning
What I need is similar but I need three variables with merged comments instead of just one:
- all_comments: The same as above but I think I can simplify the code stating "any variable including Comm in its name"
- A_comments: The same logic but merging variables including AComm only
- B_comments: The same logic but merging variables including BComm only
I think I can replace this:
~is.character(.) & any(nchar(.) > 15
by something stating that variables should include Comm but do I need to repeat all last 5 lines of the code for A_comments and B_comments?
Can you help with the entire code above rewritten to get 3 new variables mentioned above and simplified?
Once the code is ready I would like to keep it universal for other datasets.
Let's imagine we do not have comment variables clearly described by their name and instead of AComm1, AComm2, BComm1, BComm2 we have just A1, A2, B1 and B2.
URN = c("aaa","bbb","ccc",
"ddd","eee","fff","ggg","hhh"),
Name = c("xxx","xxx","yyy",
"yyy","yyy","zzzz","abcde","zzzz"),
A1 = c("None.",NA,
"No comments related to this exercise","Na",
"N/A","Interesting comment","abc", "whatever is fine"),
A2 = c("Nothing",
"I have nothing in common","NA",NA,
"Another comment","....?","xxxx", "All fine"),
B1 = c("Service","All good",
"aa","I don't know",
"The final comment about that","Nothing.","na","Everything"),
B2 = c("aaa","Nothing",
"None","My final comments are ok", "I don't know",
"Nothing.","Another comment","really"),
Q4 = c(2019,2020,2020,2019,
2020,2021,2021,2019)
)
Can we add an initial step and add "Comm" in the beginning or in the end of each variable name which is character and has responses longer than 15 characters (so URN and Name would be excluded from merging)?
How would I rename just these variables?
where(~is.character(.x) & any(nchar(.x) > 15)
Is this task challenging?
Thank you for your help.