case_when using tibble columns or vectors

I ran one fuzzy match pass on my data (obtaining names_match) but there are some mismatches that I arranged, subsetted out and corrected with another iteration of fuzzy matching (correct).

Now the trouble I have is trying to append the correctly matched names back to my overall dataset, by comparing the name and their group. I am unsure how to get case_when to work by passing entire columns/vectors into it, instead of having to name each case.

Here is some reprex data:

# Output obtained from first fuzzy match, with some mistakes
names_match <- tibble(group = c(rep(1, 3), rep(2, 3), rep(3, 4), rep(4, 3)), 
                      match_label = c("Andy", "Angelica", "Aaron", 
                                      "Betty", "Ben", "Barack", 
                                      "Chrissy", "Celeste", "Cameron", "Candace", 
                                      "Danny", "Dane", "Devline"), 
                      matched_label = c("Angelica P.", "Andy A.", "Aaron J.", 
                                        "Barack O.", "Betty F.", "Ben T.", 
                                        "Celeste J.", "Chrissy L.", "Cameron W.", "Candace O.",
                                        "Danny S.", "Dane D.", "Devline J.")
)

names_match # mismatched rows: 1, 2, 4, 5, 6, 7, 8
# A tibble: 13 x 3
   group to_match matched_label
   <dbl> <chr>    <chr>        
 1     1 Andy     Angelica P.  
 2     1 Angelica Andy A.      
 3     1 Aaron    Aaron J.     
 4     2 Betty    Barack O.    
 5     2 Ben      Betty F.     
 6     2 Barack   Ben T.       
 7     3 Chrissy  Celeste J.   
 8     3 Celeste  Chrissy L.   
 9     3 Cameron  Cameron W.   
10     3 Candace  Candace O.   
11     4 Danny    Danny S.     
12     4 Dane     Dane D.      
13     4 Devline  Devline J. 
# Subset top_n rows with mismatches 
# then rematched correctly 
correct <- tibble(group = c(rep(1, 3), rep(2, 3), rep(3, 2)), 
                  match_label = c("Andy", "Angelica", "Aaron", 
                                  "Betty", "Ben", "Barack", 
                                  "Chrissy", "Celeste"), 
                  correct_label = c("Andy A.", "Angelica P.", "Aaron J.", 
                                    "Betty F.", "Ben T.", "Barack O.", 
                                    "Chrissy L.", "Celeste J.")
)

correct
# A tibble: 8 x 3
  group to_match correct_label
  <dbl> <chr>    <chr>        
1     1 Andy     Andy A.      
2     1 Angelica Angelica P.  
3     1 Aaron    Aaron J.     
4     2 Betty    Betty F.     
5     2 Ben      Ben T.       
6     2 Barack   Barack O.    
7     3 Chrissy  Chrissy L.   
8     3 Celeste  Celeste J. 

A 'dumb' method I have only works when the row order is exactly the same, which kinda defeats the purpose of a case_when.

# This only works if the order is exactly the same 
named_vec <- deframe(tibble(correct$to_match, correct$correct_label))
named_vec

names_match %>% 
  slice(1:8) %>% 
  mutate(new_match = case_when(to_match == named_vec ~ named_vec, 
                         TRUE ~ named_vec)
         ) %>% 
  # Append the rest of the df back 
  add_row(names_match %>% slice(9:n())) %>% 
  # Coalesce the two columns, keeping new name if not NA 
  mutate(matched_label = ifelse(is.na(new_match) == FALSE, 
                                new_match, 
                                matched_label)) %>% 
  select(-new_match)
# A tibble: 13 x 3
   group to_match matched_label
   <dbl> <chr>    <chr>        
 1     1 Andy     Andy A.      
 2     1 Angelica Angelica P.  
 3     1 Aaron    Aaron J.     
 4     2 Betty    Betty F.     
 5     2 Ben      Ben T.       
 6     2 Barack   Barack O.    
 7     3 Chrissy  Chrissy L.   
 8     3 Celeste  Celeste J.   
 9     3 Cameron  Cameron W.   
10     3 Candace  Candace O.   
11     4 Danny    Danny S.     
12     4 Dane     Dane D.      
13     4 Devline  Devline J.

I am aware this can also be done with a join, but i'm trying to find out how to work with case_when so i'll stick to this rather contrived example :smile:

I really think this is not a use case for case_when(), this is more suited for rows_update()

library(dplyr)

names_match <- tibble(group = c(rep(1, 3), rep(2, 3), rep(3, 4), rep(4, 3)), 
                      match_label = c("Andy", "Angelica", "Aaron", 
                                      "Betty", "Ben", "Barack", 
                                      "Chrissy", "Celeste", "Cameron", "Candace", 
                                      "Danny", "Dane", "Devline"), 
                      matched_label = c("Angelica P.", "Andy A.", "Aaron J.", 
                                        "Barack O.", "Betty F.", "Ben T.", 
                                        "Celeste J.", "Chrissy L.", "Cameron W.", "Candace O.",
                                        "Danny S.", "Dane D.", "Devline J.")
)

correct <- tibble(group = c(rep(1, 3), rep(2, 3), rep(3, 2)), 
                  match_label = c("Andy", "Angelica", "Aaron", 
                                  "Betty", "Ben", "Barack", 
                                  "Chrissy", "Celeste"), 
                  matched_label = c("Andy A.", "Angelica P.", "Aaron J.", 
                                    "Betty F.", "Ben T.", "Barack O.", 
                                    "Chrissy L.", "Celeste J.")
)

names_match %>% 
    rows_update(correct, by = "match_label")
#> # A tibble: 13 × 3
#>    group match_label matched_label
#>    <dbl> <chr>       <chr>        
#>  1     1 Andy        Andy A.      
#>  2     1 Angelica    Angelica P.  
#>  3     1 Aaron       Aaron J.     
#>  4     2 Betty       Betty F.     
#>  5     2 Ben         Ben T.       
#>  6     2 Barack      Barack O.    
#>  7     3 Chrissy     Chrissy L.   
#>  8     3 Celeste     Celeste J.   
#>  9     3 Cameron     Cameron W.   
#> 10     3 Candace     Candace O.   
#> 11     4 Danny       Danny S.     
#> 12     4 Dane        Dane D.      
#> 13     4 Devline     Devline J.

Created on 2021-08-08 by the reprex package (v2.0.0)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.