Irakli
September 1, 2022, 6:09pm
1
Hello,
I have a simple task: I have a large dataset, and I want to filter rows where subset of columns ALL contain NA (not if any of them contains NA).
data <- data.frame(x1 = c(4, 1, NA, 7, 8, 1), # Create example data
x2 = c("A", NA, NA, "XX", "YO", "YA"),
x3 = c(1, 0, NA, 1, 1, NA)
x4 = c("A", "B", "C", "XX", "YO", "YA") )
I want to remove rows where all from x1, x2, x3 contain NAs and leave out x4 while filtering (not my variable of interest in terms of NAs).
it's easy to do when you want to filter rows if any of the columns contain NA but I couldn't find a decent solution for this one.
You can do the following, which eliminates the third row where x1, x2, and x3 are NA.
library(dplyr)
data %>%
filter(!(is.na(x1) & is.na(x2) & is.na(x3)))
#> x1 x2 x3 x4
#> 1 4 A 1 A
#> 2 1 <NA> 0 B
#> 3 7 XX 1 XX
#> 4 8 YO 1 YO
#> 5 1 YA NA YA
Created on 2022-09-01 with reprex v2.0.2.9000
3 Likes
Irakli
September 1, 2022, 7:04pm
4
Thank you for the response. I wrote similar code but put '!' before each condition and it removed ANY NA, so this makes more sense.
system
Closed
September 22, 2022, 7:04pm
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.