I know how to identify missing values for one specific column.
sum(is.na(sleep$Dream))
But suppose I have another variable called Dream2. I want to know how many rows that have both missing values for these Dream and Dream2. How can I do?
I know how to identify missing values for one specific column.
sum(is.na(sleep$Dream))
But suppose I have another variable called Dream2. I want to know how many rows that have both missing values for these Dream and Dream2. How can I do?
Hello there,
There are a couple of ways to perform what you require. Let looks at this example:
library(tidyverse)
sleep <- data.frame(
Dream = c(1,2,3,4,5,NA),
Dream2 = c(1,2,3,NA,5,NA))
sum(is.na(sleep))
#> [1] 3
output <- sapply(sleep,is.na)
output
#> Dream Dream2
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,] FALSE FALSE
#> [4,] FALSE TRUE
#> [5,] FALSE FALSE
#> [6,] TRUE TRUE
sum(output)
#> [1] 3
Created on 2020-09-30 by the reprex package (v0.3.0)
You will see that given how you called your data I also have a dataframe called sleep
with both Dream
and Dream2
. You can simply pass in the dataframe or if you want to apply it for each column you have you can simply run it with an sapply which does it for all columns in sleep
. As you see it leads to the same result Let me know if this is a sufficient solution to your problem.
I think there's a problem in your code. The output should be 4: rows 1,2,3,5. But your output is 3.
I think your code gives the answer of logical or. Either 2 columns has missing value will be counted. I want logical and. Only those who have both missing values will be counted.
The solution I provided above shows missing values for each row (we only have 3 NA's as you can see in the example so 3 is in fact the total missing).
The language is a bit of a barrier. I wouldn't say 4 is the answer in the example as that would be all rows which had false but yes we can check that rowwise to see if at least this condition is met once to assign 0 and then not "count" it. Would you like me to create this?
to count the rows for which both are NA you can do something like
mutate_all(sleep,is.na) %>% rowwise() %>%
mutate(both_na = reduce(c(Dream,
Dream2),`&`)) %>%
group_by(both_na) %>%
summarise(n=n())
library(tidyverse)
sleep <- data.frame(
Dream = c(1,2,3,4,5,NA),
Dream2 = c(1,2,3,NA,5,NA))
sum(is.na(sleep))
#> [1] 3
output <- sapply(sleep,is.na)
output
#> Dream Dream2
#> [1,] FALSE FALSE
#> [2,] FALSE FALSE
#> [3,] FALSE FALSE
#> [4,] FALSE TRUE
#> [5,] FALSE FALSE
#> [6,] TRUE TRUE
sum(output)
#> [1] 3
output %>% rowSums()
#> [1] 0 0 0 1 0 2
Created on 2020-10-02 by the reprex package (v0.3.0)
You can also simply take output
and make use of rowSums
here to get your answer and see exactly how many violations you have across. So since we have 4 x 0's you know 4 qualified where 2 had missing with 1 one having 2 violations.
sum(is.na(sleep)) gives me 3. But I want 1 because only the 5th observation has both NA.
sleep <- data.frame(
Dream = c(1,2,3,4,5,NA),
Dream2 = c(1,2,3,NA,5,NA))
which(is.na(sleep$Dream)&is.na(sleep$Dream2))