dplyr::filter() issue with intervals from lubridate
This is an issue I found when going through DataCamp’s Working With
Dates and Times in R course (excellent by the way!). In the lesson
Comparing intervals and datetimes, the
dplyr::filter()
function is called on an interval and throws and
error.
Reproducible example
source("https://goo.gl/SSvqZF")
source("https://goo.gl/2aojPM")
# Create an interval for reign
monarchs <- monarchs %>%
mutate(reign = from %--% to)
# Find the length of reign, and arrange
monarchs <- monarchs %>%
mutate(length = int_length(reign))
# We've put `halleys` a data set describing appearances of Halley's comet
# in your workspace.
# Print `halleys` to examine the date. `perihelion_date` is the date the
# Comet is closest to the Sun. `start_date` and `end_date` are the range of
# dates the comet is visible from Earth.
# Create a new column, `visible`, that is an interval from `start_date` to
# `end_date`.
# New column for interval from start to end date
halleys <- halleys %>%
mutate(visible = interval(start = start_date,
end = end_date))
# The visitation of 1066
# You'll work with one appearance, extract the 14th row of `halleys`.
halleys_1066 <- halleys[14, ]
# Monarchs in power on perihelion date
# Filter `monarchs` to those where `halleys_1066$perihelion_date` is
# within `reign`.
monarchs %>%
dplyr::filter(halleys_1066$perihelion_date %within% reign) %>%
dplyr::select(name, from, to, dominion)
## Error in filter_impl(.data, quo): Column `reign` classes Period and Interval from lubridate are currently not supported.
# Monarchs whose reign overlaps visible time
# Filter `monarchs` to those where `halleys_1066$visible` overlaps `reign`
monarchs %>%
dplyr::filter(int_overlaps(halleys_1066$visible, reign)) %>%
dplyr::select(name, from, to, dominion)
## Error in filter_impl(.data, quo): Column `reign` classes Period and Interval from lubridate are currently not supported.
Looks like the Kings of England Edward the Confessor and Harold II
would have been able to see the comet. It may have been a bad omen,
neither were in power by 1067.
Question
I learned this is a current issue with dplyr
(see
here), but I’m
wondering if this is an issue that will be dealt with in the next update
of dplyr
or if there is a quick workaround?
I can use regular bracket filtering with the lubridate
intervals.
Below I used which()
and the %within%
operator:
monarchs[which(halleys_1066$perihelion_date %within% monarchs$reign),
c("name", "from", "to", "dominion")]
## # A tibble: 2 x 4
## name from to dominion
## <chr> <dttm> <dttm> <chr>
## 1 Harold II 1066-01-05 00:00:00 1066-10-14 00:00:00 England
## 2 Malcolm III 1058-03-17 00:00:00 1093-11-13 00:00:00 Scotland
And instead of using dplyr::filter()
, I used base::which()
and
lubridate::int_overlaps()
:
monarchs[which(int_overlaps(halleys_1066$visible, monarchs$reign)),
c("name", "from", "to", "dominion")]
## # A tibble: 3 x 4
## name from to dominion
## <chr> <dttm> <dttm> <chr>
## 1 Edward the Confessor 1042-06-08 00:00:00 1066-01-05 00:00:00 England
## 2 Harold II 1066-01-05 00:00:00 1066-10-14 00:00:00 England
## 3 Malcolm III 1058-03-17 00:00:00 1093-11-13 00:00:00 Scotland
link to gist