unfortunately, with that statement rows where animal != cat OR rows where !is.na(animal_numbers) are being removed. I want to only remove rows if both meet this condition. does this make sense? thank you!!
#> animal animal_numbers
#> <chr> <dbl>
#> 1 dog 4
#> 2 dog 3
#> 3 mouse 21
#> 4 cat 32
#> 5 cat NA
#> 6 dog NA
#> 7 dog NA
in the example above.. I only want row 5 gone. sorry I have not been clear.
I want a row with "cat' in animals gone if it also has "NA" in animal numbers.
what I want:
#> animal animal_numbers
#> <chr> <dbl>
#> 1 dog 4
#> 2 dog 3
#> 3 mouse 21
#> 4 cat 32
#> 5 dog NA
#> 6 dog NA
The logic of "and" and "or" when filtering is not easy. Using "or" instead of "and" it takes just one TRUE to be kept. All animals other than cat have at least one TRUE and will be kept. Cats will get a FALSE for the first part and those with an NA for the second get another FALSE and will be dropped.
See section 5.2.2 of R for Data Science for logical operators and filtering:
Of course, in hindsight the clearest solution would have been to just drop any row with animal = cat AND animal_numbers that is NA. From R for Data Science, De Morgan's Law is !(x & y) is the same as (!x | !y). A useful thing to remember.
library(tidyverse)
DF <- structure(list(animal = c(
"dog", "dog", "mouse", "cat", "dog",
"cat"
), animal_numbers = c(4, 3, 21, 32, NA, 21)), row.names = c(
NA,
-6L
), class = c("tbl_df", "tbl", "data.frame"))
# introduce a case in which cat is NA
DF[6,2] <- NA
DF |> filter(!(animal == "cat" & is.na(animal_numbers)))
#> # A tibble: 5 × 2
#> animal animal_numbers
#> <chr> <dbl>
#> 1 dog 4
#> 2 dog 3
#> 3 mouse 21
#> 4 cat 32
#> 5 dog NA