This post was suggested as a possible solution but I couldn't understand what the OP was talking about so here it goes, starting with a reprex. Also, I tried watching text mining videos (all 6, which were quite good) but that didn't help me out for this task. I basically copied StatSteph's solution to a similar issue I had in a previous post I made.
The issue I have is phrases that I put in under the wordlist weren't picked up, so some columns should say yes but are currently saying no.
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 3.6.2
#> Warning: package 'tibble' was built under R version 3.6.2
#> Warning: package 'tidyr' was built under R version 3.6.2
#> Warning: package 'dplyr' was built under R version 3.6.2
#> Warning: package 'stringr' was built under R version 3.6.2
reprex_mrn<-c("2449754", "2001748", "9023298", "2452107", "4174678", "6310355", "9338915", "2459626", "0423163")
reprex_us<-c("no significant pericardial effusion, hyperdynamic LV systolic function", "no pericardial effusion, enlarged akinetic RV, no organized cardiac function", "hyperdynamic with poor LV filling and large RV", "No Pericardial Effusion, no clot visualized", "very dilated LV with very poor function and inferoseptal WMA", "no cardiac activity", "Globally poor contractility", "Mildly enlarged left atria, Good Global Function", "poor global function progressing to cardiac arrest and then return of poor cardiac function, no pericardial effusion")
tee_reprex <- tibble(MRN=sample(reprex_mrn, 200, replace=TRUE), us_interpretation=sample(reprex_us, 200, replace=TRUE))
tee_reprex
#> # A tibble: 200 x 2
#> MRN us_interpretation
#> <chr> <chr>
#> 1 2459626 Globally poor contractility
#> 2 2449754 no cardiac activity
#> 3 4174678 no cardiac activity
#> 4 2452107 no significant pericardial effusion, hyperdynamic LV systolic functi…
#> 5 2449754 No Pericardial Effusion, no clot visualized
#> 6 6310355 No Pericardial Effusion, no clot visualized
#> 7 2452107 Mildly enlarged left atria, Good Global Function
#> 8 6310355 hyperdynamic with poor LV filling and large RV
#> 9 2449754 very dilated LV with very poor function and inferoseptal WMA
#> 10 2001748 Globally poor contractility
#> # … with 190 more rows
wordlist<-c("poor", "agonal", "cardiac arrest", "rearrested", "ventricular fibrillation", "enlarged")
tee_reprex1<- tee_reprex %>%
mutate(us_abnormal=if_else("us_interpretation" %in% wordlist, "yes", "no"))
tee_reprex1
#> # A tibble: 200 x 3
#> MRN us_interpretation us_abnormal
#> <chr> <chr> <chr>
#> 1 2459626 Globally poor contractility no
#> 2 2449754 no cardiac activity no
#> 3 4174678 no cardiac activity no
#> 4 2452107 no significant pericardial effusion, hyperdynamic LV sys… no
#> 5 2449754 No Pericardial Effusion, no clot visualized no
#> 6 6310355 No Pericardial Effusion, no clot visualized no
#> 7 2452107 Mildly enlarged left atria, Good Global Function no
#> 8 6310355 hyperdynamic with poor LV filling and large RV no
#> 9 2449754 very dilated LV with very poor function and inferoseptal… no
#> 10 2001748 Globally poor contractility no
#> # … with 190 more rows