How to find missing values for both columns?

Tung · September 30, 2020, 6:15pm

I know how to identify missing values for one specific column.

sum(is.na(sleep$Dream))

But suppose I have another variable called Dream2. I want to know how many rows that have both missing values for these Dream and Dream2. How can I do?

GreyMerchant · September 30, 2020, 6:28pm

Hello there,

There are a couple of ways to perform what you require. Let looks at this example:

library(tidyverse)

sleep <- data.frame(
  Dream = c(1,2,3,4,5,NA),
  Dream2 = c(1,2,3,NA,5,NA))


sum(is.na(sleep))
#> [1] 3

output <- sapply(sleep,is.na)

output
#>      Dream Dream2
#> [1,] FALSE  FALSE
#> [2,] FALSE  FALSE
#> [3,] FALSE  FALSE
#> [4,] FALSE   TRUE
#> [5,] FALSE  FALSE
#> [6,]  TRUE   TRUE

sum(output)
#> [1] 3

^{Created on 2020-09-30 by the reprex package (v0.3.0)}

You will see that given how you called your data I also have a dataframe called sleep with both Dream and Dream2. You can simply pass in the dataframe or if you want to apply it for each column you have you can simply run it with an sapply which does it for all columns in sleep. As you see it leads to the same result Let me know if this is a sufficient solution to your problem.

Tung · October 2, 2020, 3:25pm

I think there's a problem in your code. The output should be 4: rows 1,2,3,5. But your output is 3.

I think your code gives the answer of logical or. Either 2 columns has missing value will be counted. I want logical and. Only those who have both missing values will be counted.

GreyMerchant · October 2, 2020, 3:59pm

The solution I provided above shows missing values for each row (we only have 3 NA's as you can see in the example so 3 is in fact the total missing).

The language is a bit of a barrier. I wouldn't say 4 is the answer in the example as that would be all rows which had false but yes we can check that rowwise to see if at least this condition is met once to assign 0 and then not "count" it. Would you like me to create this?

nirgrahamuk · October 2, 2020, 4:11pm

to count the rows for which both are NA you can do something like


mutate_all(sleep,is.na) %>% rowwise() %>%
  mutate(both_na = reduce(c(Dream,
                            Dream2),`&`)) %>% 
  group_by(both_na) %>% 
  summarise(n=n())

GreyMerchant · October 2, 2020, 6:21pm

library(tidyverse)

sleep <- data.frame(
  Dream = c(1,2,3,4,5,NA),
  Dream2 = c(1,2,3,NA,5,NA))


sum(is.na(sleep))
#> [1] 3

output <- sapply(sleep,is.na) 

output
#>      Dream Dream2
#> [1,] FALSE  FALSE
#> [2,] FALSE  FALSE
#> [3,] FALSE  FALSE
#> [4,] FALSE   TRUE
#> [5,] FALSE  FALSE
#> [6,]  TRUE   TRUE


sum(output)
#> [1] 3


output %>% rowSums()
#> [1] 0 0 0 1 0 2

^{Created on 2020-10-02 by the reprex package (v0.3.0)}

You can also simply take output and make use of rowSums here to get your answer and see exactly how many violations you have across. So since we have 4 x 0's you know 4 qualified where 2 had missing with 1 one having 2 violations.

Tung · October 25, 2020, 3:29am

sum(is.na(sleep)) gives me 3. But I want 1 because only the 5th observation has both NA.

fcas80 · September 1, 2021, 7:48pm

sleep <- data.frame(
Dream = c(1,2,3,4,5,NA),
Dream2 = c(1,2,3,NA,5,NA))
which(is.na(sleep$Dream)&is.na(sleep$Dream2))