Drop Rows with missing values based on several columns

I have a simple dataframe like this;
missing values

I want to drop only the rows where the values in columns "c2", "c3" ,"c4" are missing.
Tried using 'complete.cases' but it removes the rows based on "OR" logic. What i want to do is the AND logic. That is, remove the row only if C2 AND C3 AND C4 are missing.
Any help on this would be highly appreciated.

Hi, try this.

library(tidyverse)

# fake data
df <- tibble(c1 = sample(1:100, 7),
             c2 = c(1, 2, NA, 2, NA, 5, NA),
             c3 = c(1, 2, NA, 2, NA, 6, 4),
             c4 = c(1, 2, NA, 2, NA, 7, NA),
             c5 = sample(1:100, 7)) 


df %>% 
  filter(if_any(c2:c4, ~ !is.na(.)))


# # A tibble: 5 x 5
# c1    c2    c3    c4    c5
# <int> <dbl> <dbl> <dbl> <int>
# 1    57     1     1     1    80
# 2    86     2     2     2    95
# 3    26     2     2     2    43
# 4    95     5     6     7    71
# 5    58    NA     4    NA    59

Also, next time please provide a reproducible example instead of a screenshot.

1 Like

Screenshots are disfavored, and in the future, please use a reprex. See the FAQ.

But for simple cases, as a welcome, newcomers get a break.

make_fakes <- function() sample(c(1:100,NA),7)
set.seed(42)
(dat <- data.frame(c1 = make_fakes(),
                  c2 = make_fakes(),
                  c3 = make_fakes(),
                  c4 = make_fakes(),
                  c5 = make_fakes()))
#>   c1 c2 c3 c4 c5
#> 1 49 24 41 34 30
#> 2 65 71 89 92 43
#> 3 25 89 27  3 15
#> 4 74 37 36 58 22
#> 5 18 20 95 97 58
#> 6 NA 26  5 42  8
#> 7 47  3 84 24 36

dat[c(3,5),2:4] <- NA
dat[6,1] <- 12
dat
#>   c1 c2 c3 c4 c5
#> 1 49 24 41 34 30
#> 2 65 71 89 92 43
#> 3 25 NA NA NA 15
#> 4 74 37 36 58 22
#> 5 18 NA NA NA 58
#> 6 12 26  5 42  8
#> 7 47  3 84 24 36
dat |> rowSums(dat,na.rm = TRUE)
#> Warning in dims < 1L || dims > length(dn) - 1L: 'length(x) = 35 > 1' in coercion
#> to 'logical(1)'

#> Warning in dims < 1L || dims > length(dn) - 1L: 'length(x) = 35 > 1' in coercion
#> to 'logical(1)'
#> Error in rowSums(dat, dat, na.rm = TRUE): invalid 'dims'
dat[2:4]
#>   c2 c3 c4
#> 1 24 41 34
#> 2 71 89 92
#> 3 NA NA NA
#> 4 37 36 58
#> 5 NA NA NA
#> 6 26  5 42
#> 7  3 84 24
# remove rows with any of 2:4 containing NA
dat[-which(is.na(dat[2]) | is.na(dat[3]) | is.na(dat[4])),]
#>   c1 c2 c3 c4 c5
#> 1 49 24 41 34 30
#> 2 65 71 89 92 43
#> 4 74 37 36 58 22
#> 6 12 26  5 42  8
#> 7 47  3 84 24 36
# all of 2:4 containing NA
dat[-which(is.na(dat[2]) & is.na(dat[3]) & is.na(dat[4])),]
#>   c1 c2 c3 c4 c5
#> 1 49 24 41 34 30
#> 2 65 71 89 92 43
#> 4 74 37 36 58 22
#> 6 12 26  5 42  8
#> 7 47  3 84 24 36

Created on 2023-01-23 with reprex v2.0.2

2 Likes

ha! I went with if_all, the cousin of if_any

library(tibble)
library(dplyr)

dat <- tibble(
 a = c(1, NA, 3, NA, 5, 6, NA),
 b = c(8, 9, 10, 11, 12, 13, 14),
 c = c(15, NA, 17, NA, NA, 20, 21),
 d = c(22, NA, 23, 24, 25, 26, 26)
)

# drop rows only if values in cols a, c, and d are NA
dat |> 
 filter(!if_all(.cols = c(a,c,d), .fns = is.na))
3 Likes

@jeremy & @williaml

I'm not in the {base} is always best Church of the One True Faith, but I found once I got over the punctuation-intensive aspect, syntax is easier with fewer and more explicit parameters to stuff into my addled head.

2 Likes

indeed, your approach with base R is much more clear about how the conditions are tested, and it's a good reminder of how base functions and operators can accomplish so much

1 Like

Your solution works like a charm! Thanks a lot for that.
And next time when I am posting, will keep in mind about the good practices such as producing a reproducible example without just putting the screenshot.

1 Like

Thanks for your reply.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.