Re-Writing "Fuzzy Join" Functions from R to SQL

In the R programming language, I am interested in performing a "fuzzy join" and passing this through a SQL Connection:

library(fuzzyjoin)
library(dplyr)

library(RODBC)
library(sqldf)

con = odbcConnect("some name", uid = "some id", pwd = "abc")

sample_query = sqlQuery( stringdist_inner_join(table_1, table_2, by = "id2", max_dist = 2) %>%
  filter(date_1 >= date_2, date_1 <= date_3) )

view(sample_query)

However, I do not think this is possible, because the function which us being used for the "fuzzy join" (stringdist_inner_join) is not supported by Netezza.

I tried to find the source code for this "fuzzy join" function, and found it over here: fuzzyjoin source: R/stringdist_join.R

My Question: Does anyone know if it is possible to (manually) convert this "fuzzy join" function into an SQL format that will be recognized by Netezza? Are there any quick ways to re-write this function (stringdist_inner_join) such that it can be recognized by Netezza?

Right now I can only execute "sample_query" on locally - re-writing this function (stringdist_inner_join) would let perform the "sample_query" much faster.

Does anyone know if this is possible?

Thanks!

Note:

My data looks like this:

table_1 = data.frame(id1 = c("123 A", "123BB", "12 5", "12--5"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))

table_1$id1 = as.factor(table_1$id1)
table_1$id2 = as.factor(table_1$id2)
table_1$date_1 = as.factor(table_1$date_1)

table_2 = data.frame(id1 = c("0123", "1233", "125  .", "125_"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))


table_2$id1 = as.factor(table_2$id1)
table_2$id2 = as.factor(table_2$id2)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.