I have data I have collected from my university dissertation experiment online and I have loaded the data into R. I am having trouble trying to filter out participants who have scored less than 80% in the first stage of the experiment and participants who have scored less than 100% in the second stage. I was wondering what coding to use to filter out this data.
Here is the coding I have used so far:
library(tidyverse)
dat <- read_csv("Learningtask.csv")
tidydat <- dat %>% select(participant, response_age_a,response_age_b, response_gender, stage, trial_type, cue, correct_response, response, correct, rt, block_s1, count_s1, count_s2)
dat %>% head ()
dat%>% glimpse()
dat_clean <-
dat %>% select(participant,
trial_type,
stage,
cue,
correct_response,
response,
correct,
rt,
block_s1,
count_s1,
count_s2)
Here I have filtered out the columns I do not need but unsure of what coding to put in the below sentence:
To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:
It's hard to tell without a reprex, but one thing I see is that you probably want to use one of the "not in" variations (see SO thread linked below), as opposed to !subject if you're doing something with exclude (though that wouldn't be my recommended approach).
I'd filter on whatever variable is the score of first stage filter(foo >= 80), and do the same for whatever the second-stage variable is.
'I'd filter on whatever variable is the score of first stage filter(foo >= 80), and do the same for whatever the second-stage variable is.'
So would I.
But I suspect that's the problem. He doesn't have a score column? Note the "response", "correct" and "correct_response" columns. So I think he has the data in long form
So needs to group_by(participant, stage) then summarise(score = 100*(n(correct[correct = T])/n(correct)) perhaps
Yeah, for sure! I was just answering about the filtering issue, and assumed the data was being summarized somewhere. You'd definitely want to get the participant scores at each stage, etc.
At this point we're basically imaginary coding without a reprex.
You do have an extraneous parenthesis in there. You could accomplish the same thing with a not-in operator, but that would be the same thing (ht @martin.R).
e.g.
How is that against dplyr syntax? dat_clean %>% filter(!subj %in% exclude) is surely identical to dat_clean %>% filter(subj %nin% exclude), but avoids an unnecessary function. I would wager that the majority of people use the former version.
You're right. I guess I've just never done it that way…I thought there was something about operator precedence that made that funky. I've edited my earlier post to reflect that. Here's a reprex demonstrating your point (n.b. I've just addesd the as_tibble() at the end to shorten the printing):