How to exclude (or remove) CSV files based on certain conditions in 2 columns

Hello all,

Apologies for the vague title, I'm not sure how to best summarise what I'm trying to do in a concise way.

I want to analyse some pilot data from an online experiment. Each participant’s data file is imported into R using a loop and goes into one long data frame, from which it is possible to group by participant. There is a column called ‘catch’ whose values can take either 0 or 1. When it is 1, I want the values in another column to be greater than 3 (it can take values from 1 to 5). (To give a bit of context, there are 100 trials and 10 of these are catch trials, where the value is 1). For any participants who have a value of 3 or less on any of those trials, I want to exclude their entire data. How do I go about doing (or writing the code for) this?

Thank you in advance.

Here is an example using the anti_join() function.

library(dplyr)

Df <- data.frame(Participant = c("A", "A", "B", "B", "C", "C"),
                 catch = c(1, 0, 0, 1, 0, 1),
                 Other = c(4,2,4,1,3,5))
Df
#>   Participant catch Other
#> 1           A     1     4
#> 2           A     0     2
#> 3           B     0     4
#> 4           B     1     1
#> 5           C     0     3
#> 6           C     1     5
BadRows <- Df |> filter(catch == 1, Other <= 3)
BadRows
#>   Participant catch Other
#> 1           B     1     1
DfCln <- anti_join(Df, BadRows, by = "Participant")
DfCln
#>   Participant catch Other
#> 1           A     1     4
#> 2           A     0     2
#> 3           C     0     3
#> 4           C     1     5

Created on 2022-06-01 by the reprex package (v2.0.1)

1 Like

Thank you very much! This works perfectly. :slightly_smiling_face:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.