I'm analyzing whether I'll be able to use dbplyr with Presto. Suppose I want to right the following query using dbplyr:
SELECT evs.device_ids['google_advertising_id'] AS device_id, count(*) AS evs_count
FROM hive.aleph.impressions_daily AS imps
JOIN hive.aleph.events_daily AS evs ON evs.trans_id = imps.trans_id
WHERE imps.is_rtb = 1 AND imps.month = '201902' AND imps.day = '20190211' AND
evs.is_rtb = TRUE AND evs.app_partition = 1087 AND evs.month = '201902' AND evs.day = '20190211' AND
date_diff('minute', from_iso8601_timestamp(imps.created), from_iso8601_timestamp(evs.created)) < 5
GROUP BY evs.device_ids['google_advertising_id']
ORDER BY evs_count DESC
So now I want to write this using dbplyr. How do I go about accessing a key in a map column and how can I call Presto's built-in functions ?
q <- inner_join(imps, evs, by='trans_id', suffix=c('.imps', '.evs')) %>%
filter(is_rtb.imps == 1, month.imps == '201902', day.imps == '20190211') %>%
filter(app_partition == 1087, month.evs == '201902', day.evs > '20190211') %>%
group_by(device_ids) %>%
summarise(n = n())