How to filter by missing data

aosmith · November 26, 2019, 11:59pm

How did you read your data in? You may be able to address this at that point.

Since you can't post your actual data, the next best thing is to post a small dataset that shows the same problem.

For example, here is a very small dataset that has missing values in two variables. The third row is missing for the categorical "b" variable. However, by default the blank is left as a blank for categorical variables (unlike numeric variables).

dat = read.csv(text = "a, b, c
1, b, 1
, c, 2
2,, 3")

dat
#>    a  b c
#> 1  1  b 1
#> 2 NA  c 2
#> 3  2    3

When using functions from the read.table() family, na.strings can be useful for defining what should be interpreted as NA. In this example, blanks are NA so I could use na.strings = "".

     dat = read.csv(text = "a, b, c
1, b, 1
, c, 2
2,, 3", na.strings = "")
     
     dat
#>    a    b c
#> 1  1    b 1
#> 2 NA    c 2
#> 3  2 <NA> 3

If the blanks-as-NA can't be addressed when reading in the dataset, another option is to manually change blanks to NA in R. For example, dplyr::na_if() can be used to replace blanks with NA in a variable.

library(dplyr)

dat = read.csv(text = "a, b, c
1, b, 1
, c, 2
2,, 3")

dat
#>    a  b c
#> 1  1  b 1
#> 2 NA  c 2
#> 3  2    3

# No NA in b so returns no rows
filter(dat, is.na(b) )
#> [1] a b c
#> <0 rows> (or 0-length row.names)

# Manually replace blanks with NA
dat %>%
     mutate(b = na_if(b, "") ) %>%
     filter( is.na(b) )
#>   a    b c
#> 1 2 <NA> 3

^{Created on 2019-11-26 by the reprex package (v0.3.0)}