Hi,
I have the following data frame:
LAB <- data.frame("ResOrg" = c("Positive", "Detected", "ESCHERICHIA COLI-CRE", "Ebola Disease", "Legionella Finding", " ", "Ecoli Detected", "Salmonella organism", " ", " "),
"TestResult" = c("ECOCRE", "Acinetobacter baumannii", " ", "Ecoli", "MSN", "CARBAPENEM RESISTANT KLEBSIELLA PNEUMONIAE", "CRKP", "Ebola","Candida auris"," "),
"Result" = c("CITDCR", "ETACRE", "Test", "KPS", " ", "Ecoli", "KLEPNE", "Ebola"," ","ENTEROBACTER (KLEBSIELLA) AERO"),
"Test" = c("Ebola ", "Ecoli KPS", "Organism", " ", "Klebsiella", "MSN", " ", "NDM", "KLEPNE", " "),
"Notes" = c(" ", " ", "SPECIMEN NOTES:258531008 WOUND SWAB L 20130731 V1: Disease RESISTANT EcoliTesting", "SPECIMEN:COLLECTION METHOD ID: URINE - E COLIGREATER THAN 100,000 COLONY FORMING UNITS PER MLOTHER", " ", "Positive for disease Producing", " ", "Tested for NDM, KPS and MSN", "KPS", "Carbapenem ResistantKlebsiella"))
I want to create a new column based on the existing columns as follows:
LAB_Org<-LAB %>% #order matters, the overriding text string should be placed at the top (e.g., code Klebsiella aerogenes before Enterobacter aerogenes)
mutate(Organism = (case_when(
is.na(ResOrg) ~ "No Organism",
grepl("Ecoli Detected", ResOrg, ignore.case = TRUE)~"Escherichia coli",
grepl("Candida auris", ResOrg, ignore.case = TRUE)~"Candida auris",
grepl("ENTEROBACTER (KLEBSIELLA) AERO", ResOrg, ignore.case = TRUE)~"Klebsiella aerogenes",
grepl("CARBAPENEM RESISTANT KLEBSIELLA PNEUMONIAE", ResOrg, ignore.case = TRUE)~"Klebsiella pneumoniae",
grepl("Legionella Finding", ResOrg, ignore.case = TRUE)~"Legionella",
grepl("Ecoli", ResOrg, ignore.case = TRUE)~"Escherichia coli",
grepl("Ebola Disease", ResOrg, ignore.case = TRUE)~"Ebola",
grepl("CARBAPENEM RESISTANT ENTEROBACTER (KLEBSIELLA) AEROGENES", ResOrg, ignore.case = TRUE)~"Klebsiella aerogenes",
grepl("ESCHERICHIA COLI-CRE", ResOrg, ignore.case = TRUE)~"Escherichia coli",
grepl("Salmonella organism", ResOrg, ignore.case = TRUE)~"Salmonella",
TRUE~"No Organism")))
LAB_Org2<-LAB_Org %>%
mutate(Organism= (case_when(
grepl("ECOCRE", TestResult)~"Escherichia coli",
grepl("CITDCR", TestResult)~"Citrobacter Koseri",
grepl("CRKP", TestResult)~"Klebsiella pneumoniae",
grepl("KLEPNE", TestResult)~"Klebsiella pneumoniae",
grepl("ETACRE", TestResult)~"Klebsiella aerogenes",
TRUE~Organism)))
LAB_Org3<-LAB_Org2 %>%
mutate(Organism= (case_when(
grepl("ECOCRE", Result)~"Escherichia coli",
grepl("CITDCR", Result)~"Citrobacter Koseri",
grepl("CRKP", Result)~"Klebsiella pneumoniae",
grepl("KLEPNE", Result)~"Klebsiella pneumoniae",
grepl("ETACRE", Result)~"Klebsiella aerogenes",
TRUE~Organism)))
LAB_Org4<-LAB_Org %>%
mutate(Organism= (case_when(
grepl("Disease RESISTANT Ecoli", Notes)~"Escherichia coli",
grepl("E COLIGREATER", Notes)~"Escherichia coli",
grepl("Carbapenem ResistantKlebsiella", Notes)~"Klebsiella",
TRUE~Organism)))
There are issues with this way:
-
RestOrg is the column that has the most values that need to be extracted into the new column. Values (organisms) need to be extracted from other columns only when the RestOrg value is missing.
-
I need a more efficient code since I have a very extensive list of values to be extracted into the new column.
-
The values that were extracted in the second step overwrote the values in the first step.
-
I need to add a code that takes care of positive and/or detected in "ResOrg" and to extract values from the "Test" column.
Thank you