Create patterns list from df values for use in case_when

First time caller! Reprex attempted below, let me know if improperly created.

My goal is to mutate a data frame "saint" using a long list of case_when patterns. I am able to manually create the "cases_site" list of patterns to be used in the case_when mutate to successfully create the "saint_mutated" dataframe, but I want to use a much longer dataframe in the form of "df_patterns" to populate the list of patterns from.


#not used yet, want to populate cases_site list from this data
df_patterns <- tibble(
  match_string = c(".*", ".*", ".*"),
  site = c("villas", "brands", "club")

saint <- tibble(
  Key = c("", "", "", "", "")

#manually built, works fine
cases_site <- list(
  !! str_detect(saint$Key, ".*") ~ "villas",
  !! str_detect(saint$Key, ".*") ~ "brands",
  !! str_detect(saint$Key, ".*") ~ "club"

saint_mutated <- saint %>%
  mutate(Site = case_when(!!! cases_site))

Created on 2019-08-25 by the reprex package (v0.3.0)

Thanks for the quick reply! Good solution, I have used patterns with str_detect before, but stringi is new to me.

Works great except in the case of row 4, where the "" pattern was replaced with "brands/b" likely due to my regex patterns. Ideally, my regex would work inclusive for any strings before or after the pattern, so the "" Key would result in a Site replacement of just "brands". I edited the pattern to be ".**" to better match the whole string, and it seemed to work. Any feedback on that change?

I didn't notice this while posting, and now can't figure out a better solution. I deleted my earlier post because of this issue.

If modifying the patterns is alright with your use case, then it should be OK. Instead of adding .* both before and after each pattern, you can use paste0 inside the function call as follows:

#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>     filter, lag
#> The following objects are masked from 'package:base':
#>     intersect, setdiff, setequal, union

df_patterns <- tibble(match_string = c(".*", ".*", ".*"),
                      site = c("villas", "brands", "club"))

saint <- tibble(Key = c("", "", "", "", ""))

saint %>%
  mutate(Site = stri_replace_all_regex(str = Key,
                                       pattern = paste0(".*", df_patterns$match_string, ".*"),
                                       replacement = df_patterns$site,
                                       vectorize_all = FALSE))
#> # A tibble: 5 x 2
#>   Key         Site  
#>   <chr>       <chr> 
#> 1   villas
#> 2 villas
#> 3   brands
#> 4 brands
#> 5   club

Thanks again Anirban. Modifying the patterns worked out.

One issue I still have is related to the difference between the case_when and pattern/regex option is for non-matches. Case_when you can specify what your non-match result will be (NA in my preferred case), but stri_replace_all always re-uses the current value if no match is found in the patterns, which is problematic with a mutate.

Anyone have suggestions on how to read in the df_patterns data frame into the cases_site list?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.