Am I crazy? Having issues with `dplyr::filter`

I've often used data %>% filter(is.na(col)) as a way to inspect the data where a missing value is located--there's often a lot of context that needs investigation before I decide to remove missing data and I'm always scared of things like na.omit() or complete.cases().

Today something happened that seemed weird, which is shy I'm asking, "[a]m I crazy?"

It seems like dplyr::filter is behaving differently; at least some older code is not working the way that it used to. Often I use the Interval class from lubridate in my work and usually an interval column in a tbl_df doesn't throw filter off in this way, take a look:

library(tidyverse)
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
#> ✔ tibble  1.4.2          ✔ dplyr   0.7.5     
#> ✔ tidyr   0.8.0          ✔ stringr 1.3.0     
#> ✔ readr   1.1.1          ✔ forcats 0.3.0
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ✖ dplyr::vars()   masks ggplot2::vars()
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

data <- tribble(
  ~id, ~move_in, ~move_out,
  147, "20110115", "20130521",
  148, "20170222", NA,
  149, NA, NA,
  150, NA, "20170101",
  151, "20160506", "20180125") %>%
  mutate(
    move_in = parse_date(move_in, "%Y%m%d"),
    move_out = parse_date(move_out, "%Y%m%d"),
    length_of_stay = move_in %--% move_out
  )

glimpse(data)
#> Observations: 5
#> Variables: 4
#> $ id             <dbl> 147, 148, 149, 150, 151
#> $ move_in        <date> 2011-01-15, 2017-02-22, NA, NA, 2016-05-06
#> $ move_out       <date> 2013-05-21, NA, NA, 2017-01-01, 2018-01-25
#> $ length_of_stay <S4: Interval> 2011-01-15 UTC--2013-05-21 UTC, 2017-0...

data %>% filter(is.na(move_in))
#> Error in filter_impl(.data, quo): Column `length_of_stay` classes Period and Interval from lubridate are currently not supported.

data %>% filter(!is.na(move_in))
#> Error in filter_impl(.data, quo): Column `length_of_stay` classes Period and Interval from lubridate are currently not supported.

data %>% 
  select(-length_of_stay) %>%
  filter(is.na(move_in))
#> # A tibble: 2 x 3
#>      id move_in    move_out  
#>   <dbl> <date>     <date>    
#> 1  149. NA         NA        
#> 2  150. NA         2017-01-01

data %>%
  tally(is.na(move_in))
#> # A tibble: 1 x 1
#>       n
#>   <int>
#> 1     2

data %>%
  count(year(move_in))
#> # A tibble: 4 x 2
#>   `year(move_in)`     n
#>             <dbl> <int>
#> 1           2011.     1
#> 2           2016.     1
#> 3           2017.     1
#> 4             NA      2

It seems like I can't use filter so long as this Interval column is in the data, but I can remove it and things will work. The strange thing though is that functions like count() and tally() don't seem to be thrown off in this way.

Maybe I'm not doing something correctly r emo::ji("man_shrugging") :man_shrugging:t2:

Looks like that support was removed a month ago:

I guess as part of the response to

3 Likes

Ahh, I see--thanks! For whatever reason I didn't see this issue. I'll have to use a work around for now.

If anyone else runs into this issue, this is my work around (if someone has a better one I'd be interested in learning about it):

data %>% select_if(~ class(.x) != "Interval")

2 Likes

I discovered this issue recently when updating some code that I was working on. All of a sudden dplyr::filter couldn't use %within%. I switched over to using my_df[my_date %within% all_dates, ]