I just came across a behavior of str_detect
which was new to me and I was wondering how others are dealing with such cases:
library(tidyverse)
df_test<- data.frame(
stringsAsFactors = FALSE,
names = c("tom", "max", "ella", "franz"),
family = c("huber", "huber", "bauer", NA),
age = c(10L, 4L, 7L, NA)
)
df_test %>%
filter(str_detect(family, "huber"))
#> names family age
#> 1 tom huber 10
#> 2 max huber 4
The result below surprises me. Why doesn't the negated version of str_detect
also return franz whose familyname is NA. My - apparently wrong - understanding was that NA !="huber" and hence the row with NA should be returned.
df_test %>%
filter(!str_detect(family, "huber"))
#> names family age
#> 1 ella bauer 7
df_test %>%
filter(str_detect(family, "huber", negate=T))
#> names family age
#> 1 ella bauer 7
Since such cases can be quite often, does this mean that every negated str_detect
should specifically account for NAs (as below) ?
df_test %>%
filter(str_detect(family, "huber", negate=T) | is.na(family))
names family age
1 ella bauer 7
2 franz <NA> NA
I find this behavior surprising. Personally, I think would prefer to have an option in str_detect
to match also NAs, but I strongly assume that there's an explanation for it. Many thanks.
Created on 2021-12-31 by the reprex package (v2.0.1)