Filter across multiple columns to return only rows that contain BOTH names

turkjr19 · January 1, 2023, 3:29pm

# Sample data frame data that I have
df <- tibble(
  g = c("Bob", "Clark", "Greg", "Bill"),
  a1 = c("Clark", "Greg", "Jim", "Barry"),
  a2 = c("Sam", "Bob", "Noah", "Clark")
)

# Want to filter sample data frame to return only rows that contain "Bob" and "Clark"
df_want <- tibble(
  g = c("Bob", "Clark"),
  a1 = c("Clark", "Greg"),
  a2 = c("Sam", "Bob")
)

^{Created on 2023-01-01 with reprex v2.0.2}

scottyd22 · January 1, 2023, 4:04pm

Here is one way to get the desired output.

library(tidyverse)

df <- tibble(
  g = c("Bob", "Clark", "Greg", "Bill"),
  a1 = c("Clark", "Greg", "Jim", "Barry"),
  a2 = c("Sam", "Bob", "Noah", "Clark")
)

df %>%
  mutate(combined = str_c(g, a1, a2, sep = ' ')) %>%
  filter(str_detect(combined, 'Bob') & str_detect(combined, 'Clark')) %>%
  select(-combined)
#> # A tibble: 2 × 3
#>   g     a1    a2   
#>   <chr> <chr> <chr>
#> 1 Bob   Clark Sam  
#> 2 Clark Greg  Bob

Created on 2023-01-01 with reprex v2.0.2.9000

jkatam · January 1, 2023, 4:21pm

Alternatively, please check the below code

library(tidyverse)

# Sample data frame data that I have
df <- tibble(
  g = c("Bob", "Clark", "Greg", "Bill"),
  a1 = c("Clark", "Greg", "Jim", "Barry"),
  a2 = c("Sam", "Bob", "Noah", "Clark")
)

#solution 
df2 <- df %>% rowwise() %>% 
  mutate(across(everything(), ~ifelse(tolower(.x)=='bob' | tolower(.x)=='clark', 1, 0), .names = '{.col}_n'),
                                   new=sum(across(ends_with('_n')))) %>% filter(new>=2) %>% select(-ends_with('_n'), -new)

turkjr19 · January 2, 2023, 3:52am

Thank you! This worked perfectly.

system · January 9, 2023, 3:52am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.