How do I use the filter function from the dplyr package in studio to remove messy data? My csv file has a column of data which is supposed to be binary, but there is non binary data greater than 1 (values of 2). How do I filter this out my dataset?
Welcome to R community.
You may start by reading how to post a question here for faster responses:
However, since this is your first post we will help you straight off.
Your problem is straightforward.
library(tidyverse)
# assume you have this dataframe, with few values of y that are 1 or 0 and you want to filter those rows.
df <- tribble(
~x, ~y,
"A",1,
"B",2,
"C",0,
"D",-1,
"E",NA
)
df
#> # A tibble: 5 × 2
#> x y
#> <chr> <dbl>
#> 1 A 1
#> 2 B 2
#> 3 C 0
#> 4 D -1
#> 5 E NA
# filtering y to be in a set of values
df %>% filter(y %in% c(0,1))
#> # A tibble: 2 × 2
#> x y
#> <chr> <dbl>
#> 1 A 1
#> 2 C 0
Created on 2022-05-07 by the reprex package (v2.0.1)
Happy learning R
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.
If you have a query related to it or one of the replies, start a new topic and refer back with a link.