Create new column with multiple conditions

I need to create a new column "Gene_Mechanism". using multiple conditions:
For 'Disease' = Candida Auris, Gene Mechanism = NA
For all other diseases, extract the values from the columns: "Local_Test_Description", "Local_Organism_Description", "Local_Organism_Code", "Notes", "Result_Text", "Result" as in the gene-patterns list in the code below.
For the values in the ignore pattern, I need to get 'No Gene' result. (these values are available in the "Notes" column only).
Everything else should be 'Unknown'

gene_patterns <- list(
KPC = "KPC Resistant Marker|KPC|KPC \(CRE RESISTANCE GENE\).DNA.XXX.ORD.NAAT\(BEAKER\)|KPC \(CARBAPENEM RESISTNC\)|KPC \(CRE RESISTANCE GENE\), DNA \(BEAKER\)|KPCGene by PCR|KPC Gene|KPC Resistance Gene|KPC",
NDM = "NDM Gene Detection|NDM \(CRE RESISTANCE GENE\), DNA \(BEAKER\)|NDM RESISTANCE GENE|NDM|NDM RESISTANCE MARKER|NDM GENE BY PCR|NDM GENE|NDM",
VIM = "VIM P. PUTIDA|VIM \(CRE RESISTANCE GENE\), DNA \(BEAKER\)|VIM RESISTANCE MARKER|VIM RESISTANCE GENE",
OXA = "OXA-48-LIKE \(CRE RESISTANCE GENE\).DNA.XXX.ORD \(BEAKER\)|OXA RESISTANCE MARKER",
Ignore = "Tested for KPC, OXA, NDM, IMP, and VIM Carbapenemase resistance markers| Negative for common carbapenamase|MECHANISMS OTHER THAN A CARBAPENEMASE|Targeted carbapenemase producing genes not detected|Targeted carbapenemase enzymes not detected"
)

Function to apply the gene patterns and ignore phrases

check_gene_mechanism <- function(description, patterns) {
if (is.na(description) || str_detect(description, patterns$Ignore)) {
return("No Gene")
}

for (gene in names(patterns)) {
if (gene != "Ignore" && str_detect(description, patterns[[gene]])) {
return(gene)
}
}

return("No Gene")
}

Columns to check

columns_to_check <- c("Local_Test_Description", "Local_Organism_Description", "Local_Organism_Code", "Notes", "Result_Text", "Result")

Apply the logic to each column and create a list of results

LAB_Org <- LAB_Org %>%
rowwise() %>%
mutate(Gene_Mechanism_Local_Test_Description = check_gene_mechanism(Local_Test_Description, gene_patterns),
Gene_Mechanism_Local_Organism_Description = check_gene_mechanism(Local_Organism_Description, gene_patterns),
Gene_Mechanism_Local_Organism_Code = check_gene_mechanism(Local_Organism_Code, gene_patterns),
Gene_Mechanism_Notes = check_gene_mechanism(Notes, gene_patterns),
Gene_Mechanism_Result_Text = check_gene_mechanism(Result_Text, gene_patterns),
Gene_Mechanism_Result = check_gene_mechanism(Result, gene_patterns)) %>%
ungroup()

Determine the final Gene_Mechanism based on the collected results

LAB_Org <- LAB_Org %>%
mutate(Gene_Mechanism = case_when(
Disease == "Candida Auris" ~ NA_character_,
Gene_Mechanism_Local_Test_Description != "No Gene" ~ Gene_Mechanism_Local_Test_Description,
Gene_Mechanism_Local_Organism_Description != "No Gene" ~ Gene_Mechanism_Local_Organism_Description,
Gene_Mechanism_Local_Organism_Code != "No Gene" ~ Gene_Mechanism_Local_Organism_Code,
Gene_Mechanism_Notes != "No Gene" ~ Gene_Mechanism_Notes,
Gene_Mechanism_Result_Text != "No Gene" ~ Gene_Mechanism_Result_Text,
Gene_Mechanism_Result != "No Gene" ~ Gene_Mechanism_Result,
TRUE ~ "Unknown"
)) %>%
select(-starts_with("Gene_Mechanism_")) # Clean up intermediate columns

When I use the code, I get only the specified Genes: KPC, NDM, OXA and VIM, NA and
Unknown. I'm not sure what am I doing wrong!
Thank you

This can't properly be checked without a a reprex (see the FAQ). There's no data to test with. (Also, RStudio is giving me conniptions over the long strings.)

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.