Using Presto constructs with dbplyr

mishabalyasin · February 14, 2019, 8:44pm

Take a look at cross-posting FAQ:

In general, it's completely fine to cross-post, but make it clear that this question is asked in multiple places so people don't waste their time answering question that might have been answered already somewhere else.

Link to GH issue:

github.com/tidyverse/dbplyr

Presto Support

opened 05:30PM - 14 Feb 19 UTC

closed 06:05PM - 12 Mar 19 UTC

FranGoitia

I'm analyzing whether I'll be able to use dbplyr with Presto. Suppose I want to …right the following query using dbplyr: ``` SELECT evs.device_ids['google_advertising_id'] AS device_id, count(*) AS evs_count FROM hive.aleph.impressions_daily AS imps JOIN hive.aleph.events_daily AS evs ON evs.trans_id = imps.trans_id WHERE imps.is_rtb = 1 AND imps.month = '201902' AND imps.day = '20190211' AND evs.is_rtb = TRUE AND evs.app_partition = 1087 AND evs.month = '201902' AND evs.day = '20190211' AND date_diff('minute', from_iso8601_timestamp(imps.created), from_iso8601_timestamp(evs.created)) < 5 GROUP BY evs.device_ids['google_advertising_id'] ORDER BY evs_count DESC ``` So now I want to write this using dbplyr. How do I go about accessing a key in a map column and how can I call Presto's built-in functions ? ``` q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>% filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>% filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>% group_by(device_ids) %>% summarise(n = n()) ``` Perhaps what I'm asking doesn't make sense, but is there a way to combine dbplyr with custom sql ? Something like this perhaps ? ``` date_filter <- function() { return("date_diff('minute', from_iso8601_timestamp(created.imps), from_iso8601_timestamp(created.evs)) < 5") } q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>% filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>% filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>% date_filter() %>% group_by(device_ids) %>% summarise(n = n()) ```