I have a data set (200,000 x 200) from which I use a str_detectfilter to subset based on multiple criteria and multiple variables. How can I check my work to see what values exist in the filtered data? My goal is to do this without copying and pasting for each variable.
library(tidyverse)
#> Warning: package 'tibble' was built under R version 4.1.2
#> Warning: package 'readr' was built under R version 4.1.2
# criteria to include in list
target <- c("f11", "t40") %>% # either/or
paste(collapse = "|")
target
#> [1] "f11|t40"
# data
my_data <- tribble(
~p_dx, ~dx_1, ~dx_2, ~dx_3,
"f11", "t401", NA, NA,
"f11", "t402", "f12", "t41",
"f01", "t01", "f111", "t401",
"f02", "t402", NA, NA,
"t40", "f111", NA, NA
)
my_data
#> # A tibble: 5 x 4
#> p_dx dx_1 dx_2 dx_3
#> <chr> <chr> <chr> <chr>
#> 1 f11 t401 <NA> <NA>
#> 2 f11 t402 f12 t41
#> 3 f01 t01 f111 t401
#> 4 f02 t402 <NA> <NA>
#> 5 t40 f111 <NA> <NA>
# if I want to see which target value is present one variable at a time
p_dx_list <- my_data %>%
filter(str_detect(p_dx, target)) %>%
distinct(p_dx)
p_dx_list
#> # A tibble: 2 x 1
#> p_dx
#> <chr>
#> 1 f11
#> 2 t40
dx_1_list <- my_data %>%
filter(str_detect(dx_1, target)) %>%
distinct(dx_1)
dx_2_list <- my_data %>%
filter(str_detect(dx_2, target)) %>%
distinct(dx_2)
# and so on
# then bind the rows together
my_list <- bind_rows(p_dx_list,
dx_1_list,
dx_2_list) %>%
pivot_longer(everything()) %>% # make long
drop_na(value) %>%
distinct(value) %>%
arrange(value)
# I want to see (as a list) what my str_detect filtered
my_list
#> # A tibble: 5 x 1
#> value
#> <chr>
#> 1 f11
#> 2 f111
#> 3 t40
#> 4 t401
#> 5 t402
It looks like map applies the custom function {filter, unique} to each variable and then combines the results together into a single list. I'm not clear on how.
Yes, map() iterates along a list, applying the given function to each element of the list and it returns a list. Since a data frame is a basically a list, map() can be used to iterate over all of the columns of a data frame.