I'm analyzing whether I'll be able to use dbplyr with Presto. Suppose I want to right the following query using dbplyr:
SELECT evs.device_ids['google_advertising_id'] AS device_id, count(*) AS evs_count
FROM hive.aleph.impressions_daily AS imps
JOIN hive.aleph.events_daily AS evs ON evs.trans_id = imps.trans_id
WHERE imps.is_rtb = 1 AND imps.month = '201902' AND imps.day = '20190211' AND
evs.is_rtb = TRUE AND evs.app_partition = 1087 AND evs.month = '201902' AND evs.day = '20190211' AND
date_diff('minute', from_iso8601_timestamp(imps.created), from_iso8601_timestamp(evs.created)) < 5
GROUP BY evs.device_ids['google_advertising_id']
ORDER BY evs_count DESC
So now I want to write this using dbplyr. How do I go about accessing a key in a map column and how can I call Presto's built-in functions ?
q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>%
filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>%
filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>%
group_by(device_ids) %>%
summarise(n = n())
I don't know whether Presto is supported:
https://db.rstudio.com/databases
dbplyr
allows you to use native functions of the database. You can check the actual query sent to the db using show_query()
.
1 Like
Perhaps what I'm asking doesn't make sense, but is there a way to combine dbplyr with custom sql ?
Something like this perhaps ?
date_filter <- function() {
return("date_diff('minute', from_iso8601_timestamp(created.imps), from_iso8601_timestamp(created.evs)) < 5")
}
q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>%
filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>%
filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>%
date_filter() %>%
group_by(device_ids) %>%
summarise(n = n())
Take a look at cross-posting FAQ:
Posting the same question both here and on other sites
Posting the same question to multiple forums at the same time is often considered impolite. We don't completely ban such cross-posting, but we ask you to think hard before you do it and to follow some rules.
Cross-post sparingly
Rather than post the same thing here and elsewhere from the get-go, post in one place at a time. Let enough time go by (think days, not hours) before you take your question somewhere else. Sometimes people at another site may suggest you post here if your question doesn't fit within the scope of the other site.
Always link to your other posts, and update everywhere with any solution…
In general, it's completely fine to cross-post, but make it clear that this question is asked in multiple places so people don't waste their time answering question that might have been answered already somewhere else.
Link to GH issue:
opened 05:30PM - 14 Feb 19 UTC
closed 06:05PM - 12 Mar 19 UTC
I'm analyzing whether I'll be able to use dbplyr with Presto. Suppose I want to … right the following query using dbplyr:
```
SELECT evs.device_ids['google_advertising_id'] AS device_id, count(*) AS evs_count
FROM hive.aleph.impressions_daily AS imps
JOIN hive.aleph.events_daily AS evs ON evs.trans_id = imps.trans_id
WHERE imps.is_rtb = 1 AND imps.month = '201902' AND imps.day = '20190211' AND
evs.is_rtb = TRUE AND evs.app_partition = 1087 AND evs.month = '201902' AND evs.day = '20190211' AND
date_diff('minute', from_iso8601_timestamp(imps.created), from_iso8601_timestamp(evs.created)) < 5
GROUP BY evs.device_ids['google_advertising_id']
ORDER BY evs_count DESC
```
So now I want to write this using dbplyr. How do I go about accessing a key in a map column and how can I call Presto's built-in functions ?
```
q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>%
filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>%
filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>%
group_by(device_ids) %>%
summarise(n = n())
```
Perhaps what I'm asking doesn't make sense, but is there a way to combine dbplyr with custom sql ?
Something like this perhaps ?
```
date_filter <- function() {
return("date_diff('minute', from_iso8601_timestamp(created.imps), from_iso8601_timestamp(created.evs)) < 5")
}
q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>%
filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>%
filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>%
date_filter() %>%
group_by(device_ids) %>%
summarise(n = n())
```
1 Like
system
Closed
March 7, 2019, 8:53pm
5
This topic was automatically closed 21 days after the last reply. New replies are no longer allowed. If you have a query related to it or one of the replies, start a new topic and refer back with a link.