Convert tz in postgres with dbplyr

Is it possible to change the timezone using dplyr/dbplyr syntax in the database? All my observations are stored in UTC, but I need it in CET for my group_by/summary processing and it is not very efficient for me to collect() the data and and do it locally. My current solution is not very "pretty".

Here's some pseudo code to illustrate the problem:

library(tidyverse)

con <- DBI::dbConnect()

df <- tbl(con, "my_table")
df %>% 
  mutate(datetime = lubridate::with_tz(datetime, "CET")) %>% 
  mutate(datetime = lubridate::floor_date(datetime, "days")) %>%
  group_by(datetime) %>%
  summarise_all(funs(mean(.))

If you have additional hints for working with datetime objects and timezones in postgres with dplyr it will be much appreciated.

1 Like

UPDATE2: I used the timezone function that @jlacko suggested, but in a mutate statement and now it works as expected. Thanks, @jlacko.

The code below is the best solution I could come up with. The code works with my postgres DB. It was the mutate statement with the dbplyr::sql() function that was the hardest part, because I didn't know how to get dplyr to translate it:

library(tidyverse)

con <- DBI::dbConnect()

df <- tbl(con, "my_table")
df %>%
  mutate(datetime = timezone('CET', datetime)) %>%
  mutate(datetime = date_trunc("day", datetime)) %>%
  group_by(datetime) %>%
  summarise_all(funs(mean(.))

4 Likes

This is an interesting dbplyr trick, thanks for sharing.

In your place I would use the Postgres function timezone() to push the calculation to backend, but in doing that I would lose the ability to generate the code in summarise_all() and would have to either write the avg() for all columns by hand, or pull all the rows locally; neither of which is particularly efficient.

asdf <- dbGetQuery(con, "select timezone('CET', time) as modtime, * 
                         from my_table") %>%
  group_by(modtime) %>%
  summarise_all(funs(mean(.)))
1 Like

Thanks, @jlacko. I didn't know about the timezone function. I changed my solution as the first one didn't work as expected on closer inspection.

Glad to be of service!

timezone() is a Postgres specific function, not part of the ANSI SQL standard. It is OK not to be familiar with it, but it can be helpful at times.

As an observation: the use of SQL functions in R code, and relying on lazy evaluation, freaks me out at times. In your case it worked out brilliantly, but for some reason I prefer seeing my SQL before it gets executed.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.