First time caller! Reprex attempted below, let me know if improperly created.
My goal is to mutate a data frame "saint" using a long list of case_when patterns. I am able to manually create the "cases_site" list of patterns to be used in the case_when mutate to successfully create the "saint_mutated" dataframe, but I want to use a much longer dataframe in the form of "df_patterns" to populate the list of patterns from.
library("tidyverse")
#not used yet, want to populate cases_site list from this data
df_patterns <- tibble(
match_string = c(".*site1.com", ".*site2.com", ".*site3.com"),
site = c("villas", "brands", "club")
)
saint <- tibble(
Key = c("site1.com", "a.site1.com", "site2.com", "site2.com/b", "site3.com")
)
#manually built, works fine
cases_site <- list(
!! str_detect(saint$Key, ".*site1.com") ~ "villas",
!! str_detect(saint$Key, ".*site2.com") ~ "brands",
!! str_detect(saint$Key, ".*site3.com") ~ "club"
)
saint_mutated <- saint %>%
mutate(Site = case_when(!!! cases_site))
Thanks for the quick reply! Good solution, I have used patterns with str_detect before, but stringi is new to me.
Works great except in the case of row 4, where the "site2.com/b" pattern was replaced with "brands/b" likely due to my regex patterns. Ideally, my regex would work inclusive for any strings before or after the pattern, so the "site2.com/anything-long-here" Key would result in a Site replacement of just "brands". I edited the pattern to be ".*site2.com.*" to better match the whole string, and it seemed to work. Any feedback on that change?
I didn't notice this while posting, and now can't figure out a better solution. I deleted my earlier post because of this issue.
If modifying the patterns is alright with your use case, then it should be OK. Instead of adding .* both before and after each pattern, you can use paste0 inside the function call as follows:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringi)
df_patterns <- tibble(match_string = c(".*site1.com", ".*site2.com", ".*site3.com"),
site = c("villas", "brands", "club"))
saint <- tibble(Key = c("site1.com", "a.site1.com", "site2.com", "site2.com/b", "site3.com"))
saint %>%
mutate(Site = stri_replace_all_regex(str = Key,
pattern = paste0(".*", df_patterns$match_string, ".*"),
replacement = df_patterns$site,
vectorize_all = FALSE))
#> # A tibble: 5 x 2
#> Key Site
#> <chr> <chr>
#> 1 site1.com villas
#> 2 a.site1.com villas
#> 3 site2.com brands
#> 4 site2.com/b brands
#> 5 site3.com club
Thanks again Anirban. Modifying the patterns worked out.
One issue I still have is related to the difference between the case_when and pattern/regex option is for non-matches. Case_when you can specify what your non-match result will be (NA in my preferred case), but stri_replace_all always re-uses the current value if no match is found in the patterns, which is problematic with a mutate.
Anyone have suggestions on how to read in the df_patterns data frame into the cases_site list?