change the timezone in a postgres database before filtering on the date and collecting

I'm struggling to find a way to change the timezone in a postgres database before filtering on the date and collecting. The accepted solution here does not seem to work for me, but I might be implementing incorrectly.

I'm not sure how to make a reproducible example with a database, but I'd be happy to try if someone can share a few tips or point me to an example.

dat <- con %>%
    tbl("myTable") %>%
    #mutate(myDate = timezone('Africa/Nairobi', myDate)) # does not work
    #mutate_if(is.POSIXct, funs(force_tz(., tzone='Africa/Nairobi'))) # does not work
    filter(myDate >= "2019-04-05") %>%
    collect()

What happens if you run sql_render on the resulting query? Does it look reasonable? Can you run the SQL it generated directly on DB and check if it works? How would you do it with raw SQL, without dplyr?

Thanks @mishabalyasin. I don't have much experience with sql, but here's what someone else recommended.

dbGetQuery(con, "select timezone('CET', time) as modtime, *
from my_table")

I get an error "Failed to prepare query: ERROR" when I plug in my parameters.

I need to investigate a bit more to understand how to use sql_render.

Yes, you can try doing that. Below are couple of examples to illustrate usage of sql_render and how it can help you understand what actually will happen in DB:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

res <- dbplyr::tbl_memdb(iris)

res %>%
  dplyr::group_by(Species) %>%
  dplyr::summarise(n = dplyr::n()) %>%
  dbplyr::sql_render()
#> <SQL> SELECT `Species`, COUNT() AS `n`
#> FROM `iris`
#> GROUP BY `Species`

today <- as.character(Sys.time())

res_with_dtm <- res %>%
  dplyr::mutate(dtm = today)

# dtm is character, not datetime
res_with_dtm
#> # Source:   lazy query [?? x 6]
#> # Database: sqlite 3.29.0 [:memory:]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species dtm           
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>   <chr>         
#>  1          5.1         3.5          1.4         0.2 setosa  2019-08-03 16…
#>  2          4.9         3            1.4         0.2 setosa  2019-08-03 16…
#>  3          4.7         3.2          1.3         0.2 setosa  2019-08-03 16…
#>  4          4.6         3.1          1.5         0.2 setosa  2019-08-03 16…
#>  5          5           3.6          1.4         0.2 setosa  2019-08-03 16…
#>  6          5.4         3.9          1.7         0.4 setosa  2019-08-03 16…
#>  7          4.6         3.4          1.4         0.3 setosa  2019-08-03 16…
#>  8          5           3.4          1.5         0.2 setosa  2019-08-03 16…
#>  9          4.4         2.9          1.4         0.2 setosa  2019-08-03 16…
#> 10          4.9         3.1          1.5         0.1 setosa  2019-08-03 16…
#> # … with more rows

# STRFTIME is from SQLite, not R
res_with_dtm %>%
  dplyr::mutate(dtm = STRFTIME('%d', dtm))
#> # Source:   lazy query [?? x 6]
#> # Database: sqlite 3.29.0 [:memory:]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species dtm  
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>   <chr>
#>  1          5.1         3.5          1.4         0.2 setosa  03   
#>  2          4.9         3            1.4         0.2 setosa  03   
#>  3          4.7         3.2          1.3         0.2 setosa  03   
#>  4          4.6         3.1          1.5         0.2 setosa  03   
#>  5          5           3.6          1.4         0.2 setosa  03   
#>  6          5.4         3.9          1.7         0.4 setosa  03   
#>  7          4.6         3.4          1.4         0.3 setosa  03   
#>  8          5           3.4          1.5         0.2 setosa  03   
#>  9          4.4         2.9          1.4         0.2 setosa  03   
#> 10          4.9         3.1          1.5         0.1 setosa  03   
#> # … with more rows

res_with_dtm %>%
  dplyr::mutate(dtm = STRFTIME('%d', dtm)) %>%
  dbplyr::sql_render()
#> <SQL> SELECT `Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`, STRFTIME('%d', `dtm`) AS `dtm`
#> FROM (SELECT `Sepal.Length`, `Sepal.Width`, `Petal.Length`, `Petal.Width`, `Species`, '2019-08-03 16:14:35' AS `dtm`
#> FROM `iris`)

Created on 2019-08-03 by the reprex package (v0.3.0)

OK, great. Thanks for the example.

When I try mutate(myDate = timezone('Africa/Nairobi', myDate)):

SELECT "id", timezone('Africa/Nairobi', "myDate") AS "myDate"
FROM "myTable"

Using this with collect() seems to work now...

Investigating.

I believe the issue here was not related to time zones, but to the way Postgres handles column names. Your column seems to be named myDate with capital D, which is kinda wrong in SQL speak (it is a good practice to have your column names lowercase). Postgres allows case sensitive column names (which is IMHO wrong, but please excuse my rant).

In such cases it however requires you to quote your column with double quotes. This creates a difference from the column named just time in the old question you refer to - with regards to the timezone() function the behavior is the same.

Thanks, all. Confirming that the following DOES work:

mutate(myDate = timezone('Africa/Nairobi', myDate)) %>%
collect()

25

3 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.