I have the following two tables in R (the "Dates" appear as "Factors" in my data):
table_1 = data.frame(id = c("123", "123", "125", "125"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))
table_1$id = as.factor(table_1$id)
table_1$id2 = as.factor(table_1$id2)
table_1$date_1 = as.factor(table_1$date_1)
table_2 = data.frame(id = c("123", "123", "125", "125"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))
table_2$id = as.factor(table_2$id)
table_2$id2 = as.factor(table_2$id2)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)
I am trying to perform an "inner join" if either Condition 1 OR Condition 2 is true:
Condition_1
if table_1$id = table_2$id AND
if table_1$date BETWEEN(table_2$date_2,table_2$date_3)
Condition_2
if table_1$id2 = table_2$id2
Question: I tried to use the code below to achieve the above task:
library(sqldf)
sqldf("select distinct *
from table_1 a inner join table_2 b
on (a.date_1 between b.date_2 and b.date_3) and
a.id = b.id or (a.id2 = b.id2)")
Can someone please tell me if it is possible to convert this code into "dplyr"? I have heard that "dplyr" is more efficient at performing such opperations on larger datasets - is this true?
Thanks!