dplyr::filter() issue with intervals from lubridate

dplyr::filter() issue with intervals from lubridate

This is an issue I found when going through DataCamp’s Working With
Dates and Times in R
course (excellent by the way!). In the lesson
Comparing intervals and datetimes, the
dplyr::filter() function is called on an interval and throws and
error.

Reproducible example

source("https://goo.gl/SSvqZF")
source("https://goo.gl/2aojPM")
# Create an interval for reign
monarchs <- monarchs %>%
  mutate(reign = from %--% to) 
# Find the length of reign, and arrange
monarchs <- monarchs %>%
  mutate(length = int_length(reign))
# We've put `halleys` a data set describing appearances of Halley's comet 
# in your workspace.
# Print `halleys` to examine the date. `perihelion_date` is the date the 
# Comet is closest to the Sun. `start_date` and `end_date` are the range of
# dates the comet is visible from Earth.
# Create a new column, `visible`, that is an interval from `start_date` to
#  `end_date`.
# New column for interval from start to end date
halleys <- halleys %>% 
  mutate(visible = interval(start = start_date,
                            end = end_date))


# The visitation of 1066
# You'll work with one appearance, extract the 14th row of `halleys`.
halleys_1066 <- halleys[14, ] 

# Monarchs in power on perihelion date
# Filter `monarchs` to those where `halleys_1066$perihelion_date` is 
# within `reign`.
monarchs %>% 
  dplyr::filter(halleys_1066$perihelion_date %within% reign) %>%
  dplyr::select(name, from, to, dominion)
## Error in filter_impl(.data, quo): Column `reign` classes Period and Interval from lubridate are currently not supported.
# Monarchs whose reign overlaps visible time
# Filter `monarchs` to those where `halleys_1066$visible` overlaps `reign`
monarchs %>% 
  dplyr::filter(int_overlaps(halleys_1066$visible, reign)) %>%
  dplyr::select(name, from, to, dominion)
## Error in filter_impl(.data, quo): Column `reign` classes Period and Interval from lubridate are currently not supported.

Looks like the Kings of England Edward the Confessor and Harold II
would have been able to see the comet. It may have been a bad omen,
neither were in power by 1067.

Question

I learned this is a current issue with dplyr (see
here), but I’m
wondering if this is an issue that will be dealt with in the next update
of dplyr or if there is a quick workaround?

I can use regular bracket filtering with the lubridate intervals.
Below I used which() and the %within% operator:

monarchs[which(halleys_1066$perihelion_date %within% monarchs$reign), 
                                        c("name", "from", "to", "dominion")]
## # A tibble: 2 x 4
##   name        from                to                  dominion
##   <chr>       <dttm>              <dttm>              <chr>   
## 1 Harold II   1066-01-05 00:00:00 1066-10-14 00:00:00 England 
## 2 Malcolm III 1058-03-17 00:00:00 1093-11-13 00:00:00 Scotland

And instead of using dplyr::filter(), I used base::which() and
lubridate::int_overlaps():

monarchs[which(int_overlaps(halleys_1066$visible, monarchs$reign)), 
                                        c("name", "from", "to", "dominion")]
## # A tibble: 3 x 4
##   name                 from                to                  dominion
##   <chr>                <dttm>              <dttm>              <chr>   
## 1 Edward the Confessor 1042-06-08 00:00:00 1066-01-05 00:00:00 England 
## 2 Harold II            1066-01-05 00:00:00 1066-10-14 00:00:00 England 
## 3 Malcolm III          1058-03-17 00:00:00 1093-11-13 00:00:00 Scotland

link to gist

It should work if you install the latest version of dplyr from GitHub:

suppressPackageStartupMessages(library(tidyverse))
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date
source("https://goo.gl/SSvqZF")
source("https://goo.gl/2aojPM")
# Create an interval for reign
monarchs <- monarchs %>%
  mutate(reign = from %--% to) 
# Find the length of reign, and arrange
monarchs <- monarchs %>%
  mutate(length = int_length(reign))
# We've put `halleys` a data set describing appearances of Halley's comet 
# in your workspace.
# Print `halleys` to examine the date. `perihelion_date` is the date the 
# Comet is closest to the Sun. `start_date` and `end_date` are the range of
# dates the comet is visible from Earth.
# Create a new column, `visible`, that is an interval from `start_date` to
#  `end_date`.
# New column for interval from start to end date
halleys <- halleys %>% 
  mutate(visible = interval(start = start_date,
                            end = end_date))


# The visitation of 1066
# You'll work with one appearance, extract the 14th row of `halleys`.
halleys_1066 <- halleys[14, ] 

# Monarchs in power on perihelion date
# Filter `monarchs` to those where `halleys_1066$perihelion_date` is 
# within `reign`.
monarchs %>% 
  dplyr::filter(halleys_1066$perihelion_date %within% reign) %>%
  dplyr::select(name, from, to, dominion)
#> # A tibble: 2 x 4
#>   name        from                to                  dominion
#>   <chr>       <dttm>              <dttm>              <chr>   
#> 1 Harold II   1066-01-05 00:00:00 1066-10-14 00:00:00 England 
#> 2 Malcolm III 1058-03-17 00:00:00 1093-11-13 00:00:00 Scotland

Created on 2018-06-07 by the reprex package (v0.2.0).

2 Likes

You’re saving the day here, too???

Thank you! This might be a silly question, but tidyverse::tidyverse_update() just checks for the updated version on CRAN, not for development versions, right?

  • Martin
1 Like

I believe so. Personally, I use hrbrmstr's dtupdate a lot to check for github_update()s. If you do:

dtupdate::github_update(auto.install = TRUE)

you'll get a list of packages that you've installed from GH that can be updated and you select them by entering space-separated numbers (e.g. 1 2 5).

1 Like

A post was split to a new topic: Issue with dplyr::count; Error in summarise_impl(.data, dots) : .data is a corrupt grouped_df