You can do it with the fuzzyjoin
package, which implements various not quite exact matching joins in dplyr
syntax.
library(tidyverse)
library(fuzzyjoin)
df1 <- tibble::tribble(
~id, ~category, ~date,
1L, "a", "7/1/2000",
2L, "b", "11/1/2000",
3L, "c", "7/1/2002"
) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))
df2 <- tibble::tribble(
~category, ~other_info, ~start, ~end,
"a", "x", "1/1/2000", "12/31/2000",
"b", "y", "1/1/2001", "12/31/2001",
"c", "z", "1/1/2002", "12/31/2002"
) %>%
mutate_at(vars(start, end), as.Date, format = "%m/%d/%Y")
fuzzy_left_join(
df1, df2,
by = c(
"category" = "category",
"date" = "start",
"date" = "end"
),
match_fun = list(`==`, `>=`, `<=`)
) %>%
select(id, category = category.x, other_info, date, start, end)
#> # A tibble: 3 x 6
#> id category other_info date start end
#> <int> <chr> <chr> <date> <date> <date>
#> 1 1 a x 2000-07-01 2000-01-01 2000-12-31
#> 2 2 b <NA> 2000-11-01 NA NA
#> 3 3 c z 2002-07-01 2002-01-01 2002-12-31
Created on 2018-04-30 by the reprex package (v0.2.0).
The extra argument, in the fuzzy_left_join()
function, match_fun
, allows you to define the matching criterion for each pair of columns as a function. In this case, we want category == category
, date >= start
, and date <= end
.