I need to create a new column "Gene_Mechanism". using multiple conditions:
For 'Disease' = Candida Auris, Gene Mechanism = NA
For all other diseases, extract the values from the columns: "Local_Test_Description", "Local_Organism_Description", "Local_Organism_Code", "Notes", "Result_Text", "Result" as in the gene-patterns list in the code below.
For the values in the ignore pattern, I need to get 'No Gene' result. (these values are available in the "Notes" column only).
Everything else should be 'Unknown'
gene_patterns <- list(
KPC = "KPC Resistant Marker|KPC|KPC \(CRE RESISTANCE GENE\).DNA.XXX.ORD.NAAT\(BEAKER\)|KPC \(CARBAPENEM RESISTNC\)|KPC \(CRE RESISTANCE GENE\), DNA \(BEAKER\)|KPCGene by PCR|KPC Gene|KPC Resistance Gene|KPC",
NDM = "NDM Gene Detection|NDM \(CRE RESISTANCE GENE\), DNA \(BEAKER\)|NDM RESISTANCE GENE|NDM|NDM RESISTANCE MARKER|NDM GENE BY PCR|NDM GENE|NDM",
VIM = "VIM P. PUTIDA|VIM \(CRE RESISTANCE GENE\), DNA \(BEAKER\)|VIM RESISTANCE MARKER|VIM RESISTANCE GENE",
OXA = "OXA-48-LIKE \(CRE RESISTANCE GENE\).DNA.XXX.ORD \(BEAKER\)|OXA RESISTANCE MARKER",
Ignore = "Tested for KPC, OXA, NDM, IMP, and VIM Carbapenemase resistance markers| Negative for common carbapenamase|MECHANISMS OTHER THAN A CARBAPENEMASE|Targeted carbapenemase producing genes not detected|Targeted carbapenemase enzymes not detected"
)
Function to apply the gene patterns and ignore phrases
check_gene_mechanism <- function(description, patterns) {
if (is.na(description) || str_detect(description, patterns$Ignore)) {
return("No Gene")
}
for (gene in names(patterns)) {
if (gene != "Ignore" && str_detect(description, patterns[[gene]])) {
return(gene)
}
}
return("No Gene")
}
Columns to check
columns_to_check <- c("Local_Test_Description", "Local_Organism_Description", "Local_Organism_Code", "Notes", "Result_Text", "Result")
Apply the logic to each column and create a list of results
LAB_Org <- LAB_Org %>%
rowwise() %>%
mutate(Gene_Mechanism_Local_Test_Description = check_gene_mechanism(Local_Test_Description, gene_patterns),
Gene_Mechanism_Local_Organism_Description = check_gene_mechanism(Local_Organism_Description, gene_patterns),
Gene_Mechanism_Local_Organism_Code = check_gene_mechanism(Local_Organism_Code, gene_patterns),
Gene_Mechanism_Notes = check_gene_mechanism(Notes, gene_patterns),
Gene_Mechanism_Result_Text = check_gene_mechanism(Result_Text, gene_patterns),
Gene_Mechanism_Result = check_gene_mechanism(Result, gene_patterns)) %>%
ungroup()
Determine the final Gene_Mechanism based on the collected results
LAB_Org <- LAB_Org %>%
mutate(Gene_Mechanism = case_when(
Disease == "Candida Auris" ~ NA_character_,
Gene_Mechanism_Local_Test_Description != "No Gene" ~ Gene_Mechanism_Local_Test_Description,
Gene_Mechanism_Local_Organism_Description != "No Gene" ~ Gene_Mechanism_Local_Organism_Description,
Gene_Mechanism_Local_Organism_Code != "No Gene" ~ Gene_Mechanism_Local_Organism_Code,
Gene_Mechanism_Notes != "No Gene" ~ Gene_Mechanism_Notes,
Gene_Mechanism_Result_Text != "No Gene" ~ Gene_Mechanism_Result_Text,
Gene_Mechanism_Result != "No Gene" ~ Gene_Mechanism_Result,
TRUE ~ "Unknown"
)) %>%
select(-starts_with("Gene_Mechanism_")) # Clean up intermediate columns
When I use the code, I get only the specified Genes: KPC, NDM, OXA and VIM, NA and
Unknown. I'm not sure what am I doing wrong!
Thank you