Hi all,
I am wondering if there is a "tidy" way to join two data frames, where the joining variable will not necessarily be an exact match (I will give an example below, but look at this and this to see similar questions, and non-tidy ways of working with it.
suppressPackageStartupMessages(library(tidyverse))
df1 <- tibble::tribble(
~id, ~category, ~date,
1L, "a", "7/1/2000",
2L, "b", "11/1/2000",
3L, "c", "7/1/2002"
) %>% mutate(date = as.Date(date, format = "%m/%d/%Y"))
df2 <- tibble::tribble(
~category, ~other_info, ~start, ~end,
"a", "x", "1/1/2000", "12/31/2000",
"b", "y", "1/1/2001", "12/31/2001",
"c", "z", "1/1/2002", "12/31/2002"
) %>% mutate_at(vars(start, end), as.Date, format = "%m/%d/%Y")
df1
#> # A tibble: 3 x 3
#> id category date
#> <int> <chr> <date>
#> 1 1 a 2000-07-01
#> 2 2 b 2000-11-01
#> 3 3 c 2002-07-01
df2
#> # A tibble: 3 x 4
#> category other_info start end
#> <chr> <chr> <date> <date>
#> 1 a x 2000-01-01 2000-12-31
#> 2 b y 2001-01-01 2001-12-31
#> 3 c z 2002-01-01 2002-12-31
Since the rows for id 1 and 3 have a category that matches AND a date in between its corresponding start and end, I want the join to work. Since the row for id 2 does not have a date within the range of the corresponding category, I want it to not match. My final table should be a "LEFT JOIN" and output the following (using the non-tidy package sqldf
)
sqldf::sqldf("SELECT a.id, a.category, b.other_info, a.date, b.start, b.end
FROM df1 a
LEFT JOIN df2 b on a.category = b.category AND
a.date >= b.start AND a.date <= b.end") %>%
as_tibble()
#> # A tibble: 3 x 6
#> id category other_info date start end
#> <int> <chr> <chr> <date> <date> <date>
#> 1 1 a x 2000-07-01 2000-01-01 2000-12-31
#> 2 2 b <NA> 2000-11-01 NA NA
#> 3 3 c z 2002-07-01 2002-01-01 2002-12-31
There also seems to be a way to do this through data.table
.
Is there a way with dplyr
or one of its tidy friends? The following does not work.
left_join(df1, df2, by = c("category", "date" >= "start", "date" <= "end"))
#> Error: `by` can't contain join column `FALSE`, `TRUE` which is missing from LHS
Thanks!