Havign issues with coding my R for filtering out data

cmjackson · April 6, 2021, 3:47pm

I have data I have collected from my university dissertation experiment online and I have loaded the data into R. I am having trouble trying to filter out participants who have scored less than 80% in the first stage of the experiment and participants who have scored less than 100% in the second stage. I was wondering what coding to use to filter out this data.
Here is the coding I have used so far:

library(tidyverse)
dat <- read_csv("Learningtask.csv")
tidydat <- dat %>% select(participant, response_age_a,response_age_b, response_gender, stage, trial_type, cue, correct_response, response, correct, rt, block_s1, count_s1, count_s2)
dat %>% head ()
dat%>% glimpse()
dat_clean <-
  dat %>% select(participant, 
                 trial_type, 
                 stage,
                 cue, 
                 correct_response, 
                 response, 
                 correct, 
                 rt, 
                 block_s1, 
                 count_s1, 
                 count_s2)

Here I have filtered out the columns I do not need but unsure of what coding to put in the below sentence:

 exclude<- c()
 dat_clean <- dat_clean %>%filter(!subject %in% exclude))

If anyone could help with this it would be much-appreicated :).

andresrcs · April 6, 2021, 4:15pm

Hi!

To help us help you, could you please prepare a reproducible example (reprex) illustrating your issue? Please have a look at this guide, to see how to create one:

FAQ: How to do a minimal reproducible example ( reprex ) for beginners Guides & FAQs

A minimal reproducible example consists of the following items: A minimal dataset, necessary to reproduce the issue The minimal runnable code necessary to reproduce the issue, which can be run on the given dataset, and including the necessary information on the used packages. Let's quickly go over each one of these with examples: Minimal Dataset (Sample Data) You need to provide a data frame that is small enough to be (reasonably) pasted on a post, but big enough to reproduce your issue. Let's say, as an example, that you are working with the iris data frame head(iris) #> Sepal.Length Sepal.Width Petal.Length Petal.Width Species #> 1 5.1 3.5 1.4 0.…

mara · April 6, 2021, 4:40pm

It's hard to tell without a reprex, but one thing I see is that you probably want to use one of the "not in" variations (see SO thread linked below), as opposed to !subject if you're doing something with exclude (though that wouldn't be my recommended approach).

I'd filter on whatever variable is the score of first stage filter(foo >= 80), and do the same for whatever the second-stage variable is.

CALUM_POLWART · April 6, 2021, 5:17pm

'I'd filter on whatever variable is the score of first stage filter(foo >= 80), and do the same for whatever the second-stage variable is.'

So would I.

But I suspect that's the problem. He doesn't have a score column? Note the "response", "correct" and "correct_response" columns. So I think he has the data in long form

So needs to group_by(participant, stage) then summarise(score = 100*(n(correct[correct = T])/n(correct)) perhaps

Then he can choose what to do with that...?

mara · April 6, 2021, 5:23pm

Yeah, for sure! I was just answering about the filtering issue, and assumed the data was being summarized somewhere. You'd definitely want to get the participant scores at each stage, etc.

At this point we're basically imaginary coding without a reprex.

cmjackson · April 6, 2021, 9:01pm

Thank you!! I will try this tomorrow and see if it works

cmjackson · April 6, 2021, 10:30pm

Hello everyone, I have managed to work out which participants to remove but when i type this code into R:

exclude<- c(22617, 22638, 22666, 22701, 22714, 22720,22790, 22802, 22806, 2371, 23180, 23273,23300, 23469)
datclean <- dat_clean %>%filter(!subj %in% exclude))

it comes up with this message:

Error: unexpected ')' in "datclean <- dat_clean %>%filter(!subj %in% exclude))"

I have followed my uni Rcode and it still comes up with this message;/.

Not sure if this code is of any help :(.

Dobrokhotov1989 · April 7, 2021, 12:34am

You have the error because there is an unmatched ) at the end of the line.
I think your code should look like this:

exclude<- c(22617, 22638, 22666, 22701, 22714, 22720,22790, 22802, 22806, 2371, 23180, 23273,23300, 23469)
datclean <- dat_clean %>%filter(!subj %in% exclude)

mara · April 7, 2021, 1:26pm

You do have an extraneous parenthesis in there. You could accomplish the same thing with a not-in operator, but that would be the same thing (ht @martin.R).
e.g.

`%nin%` <- Negate(`%in%`)
datclean <- dat_clean %>% filter(subj %nin% exclude)

edit: reflect the equivalence of !foo %in% bar and foo %nin% bar

martin.R · April 7, 2021, 1:39pm

How is that against dplyr syntax?
dat_clean %>% filter(!subj %in% exclude) is surely identical to dat_clean %>% filter(subj %nin% exclude), but avoids an unnecessary function. I would wager that the majority of people use the former version.

mara · April 7, 2021, 2:24pm

You're right. I guess I've just never done it that way…I thought there was something about operator precedence that made that funky. I've edited my earlier post to reflect that. Here's a reprex demonstrating your point (n.b. I've just addesd the as_tibble() at the end to shorten the printing):

library(tidyverse)
exclude <- c("Mazda RX4 Wag", "Duster 360", "Hornet Sportabout", "Cadillac Fleetwood")
mtcars %>%
  rownames_to_column() %>%
  filter(!rowname %in% exclude) %>%
  as_tibble()
#> # A tibble: 28 x 12
#>    rowname       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  3 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  5 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  6 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#>  7 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8 Merc 280C    17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9 Merc 450SE   16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10 Merc 450SL   17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # … with 18 more rows

`%nin%` <- Negate(`%in%`)

mtcars %>%
  rownames_to_column() %>%
  filter(rowname %nin% exclude) %>%
  as_tibble()
#> # A tibble: 28 x 12
#>    rowname       mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 Mazda RX4    21       6  160    110  3.9   2.62  16.5     0     1     4     4
#>  2 Datsun 710   22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#>  3 Hornet 4 D…  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#>  4 Valiant      18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#>  5 Merc 240D    24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#>  6 Merc 230     22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
#>  7 Merc 280     19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4
#>  8 Merc 280C    17.8     6  168.   123  3.92  3.44  18.9     1     0     4     4
#>  9 Merc 450SE   16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10 Merc 450SL   17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> # … with 18 more rows

^{Created on 2021-04-07 by the reprex package (v1.0.0)}

system · April 28, 2021, 2:25pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.