Hi Posit Community.
Yet another filtering question for you if I may.
Lets say I have two dataframes/tables. One that contains a very large amount of data I need to filter based on two or more criteria, and another containing the unique pairings of rows upon which said criteria is based.
As a simple example, filtering dataframe df
:
Participant | Category | Group | Rating |
---|---|---|---|
Greg | Int | A | 21 |
Greg | Int | B | 20 |
Donna | Post | B | 10 |
Donna | Post | A | 11 |
Johnathan | Fac | A | 20 |
Johnathan | Fac | B | 10 |
Lewis | Int | B | 11 |
Lewis | Int | A | 20 |
df <- data.frame(
Participant = c('Greg', 'Greg', 'Donna', 'Donna','Johnathan','Johnathan','Lewis','Lewis'),
Category = c('Int', 'Int', 'Post', 'Post','Fac','Fac','Int','Int'),
Group = c("A","B","B","A","A","B","B","A"),
Rating = c(21, 20, 10, 11, 20, 10, 11, 20)
)
Using the criteria in dataframe filtercrit
:
Group_ID | Category | Group |
---|---|---|
Criteria_1 | Int | A |
Criteria_2 | Post | B |
To get an output like the following where the only entries retained are those that meet the paired criteria from Category and Group like so:
Participant | Category | Group | Rating |
---|---|---|---|
Greg | Int | A | 21 |
Donna | Post | B | 10 |
Lewis | Int | A | 20 |
My typical strategy of using %in%
per column would be too inclusive, and my usual workaround of mutate()
and pasting together the strings across all columns of interest into a unique string followed by using %in% would take an unrealistic amount of time to generate give then size of the real data I have (e.g. 2k filtering criteria pairs and several hundred thousand entries).
Is there a cleaner way to do this in dplyr
or tidyr
that I'm missing?
Thank you in advance!