Using filter_at()
with a database is powerful since one call to this function can generate a lot of SQL code particularly if you need to filter on many variables. For example:
filter_at(flights_db, vars(contains("time")), all_vars(. > 0))
generates
SELECT *
FROM "flights"
WHERE (((((("dep_time" > 0.0) AND ("sched_dep_time" > 0.0)) AND ("arr_time" > 0.0)) AND ("sched_arr_time" > 0.0)) AND ("air_time" > 0.0)) AND ("time_hour" > 0.0))
If I have two groups of variables that I want to filter on I can combine them using AND by filtering twice.
flights_db %>%
filter_at(vars(contains("delay")), all_vars(. < 0)) %>%
filter_at(vars(contains("time")), all_vars(. > 0))
However I do not know how to combine these filters using OR logic. If the data are not in a database then I could get the desired result using this code:
bind_rows(
filter_at(flights, vars(contains("delay")), all_vars(. < 0)),
filter_at(flights, vars(contains("time")), all_vars(. > 0))
) %>%
distinct()
I'm wondering if there is a way to get this same result using a database and some dplyr syntax I don't know about.
I'm imagining some kind of syntax like this
flights_db %>%
filter_at(all_vars(vars(contains("delay")), . < 0) | all_vars(vars(contains("time")), . > 0) )
Is there any way to get this result from a database using filter_at()
and without creating two separate queries and combining the results?
Note: I'm using the nycflights13::flights data in a postgres database.