Unix and DBplyr

Hi all,

I am hoping to ask for some help turning a unix timestamp to a date and then extract Year and month in a DBplyr connection.

I have a postgres server connection with a large dataset with a unix timestamp as a date operator. Since the dataset is massive, I want to group data by year and month, but neither of these variables exist in the dataset. Locally, I can pull the data and create these variables after converting the timestamp using the Lubridate package.

However, doing these within a DBplyr code line does not work. When I try this,

database_connection and joins%>%
mutate(date = as_date(as_datetime(timestamp)))%>%
show_query()
"TRunctated results excluding other stuff that works":
CAST(CAST("timestamp" AS TIMESTAMP) AS DATE) AS "date"

Any advise on using DBplyr to convert unix time stamps to dates and then extracting year and month from that date?

Much appreciated, everyone.

Datetime can be tricky: provide an example of the format of timestamp?

@technocrat , thanks for your reply. It is a unix timestamp, stored as an int format with the following numbers as examples. To be clear on the problem, I would ideally want to be able to mutate in new column that is a date and then extract Year and Month from that. One work around I devised was querying the timestamp column, doing the below manipulations locally, and then copying a temporary table back to the database. I then join this table to the others and then do my filtering for the big query. Not ideal...

df$timestamp <- as_datetime(df$timestamp)
df$Year <- format(as.Date(df$timestamp, format="%Y/%m/%d"),"%Y")

TIMESTAMP EXAMPLE

1463788800
1463097600
1464480000
1462924800
1463011200
1464566400
1463875200
1463011200
1462924800
1465776000

I don't have a posgresql machine on hand right now to double check, but I believe you should be able to do something like this:

database_connection and joins%>%
   mutate(date = to_timestamp(timestamp))

Where the to_timestamp() is actually not a R but postgresql function, but dbplyr should be able to translate it to backend code. It should turn your int to a datetime field, with all that it entails. For the year / month extraction consider in turn date_trunc() or extract() to push the calculation to the backend.

@jlacko . Thanks for the help with the time stamp part. Worked like a charm! If you are willing, could you maybe help with the extract() part, too? I was reviewing the documentation, but am very unfamiliar with postgres and SQL.

Trying to run the extract a month from the newly created date variable spits and error. Trying it with the timestamp, results in this SQL query conversion :

<SQL>
SELECT to_timestamp("timestamp") AS "date", extract('MONTH', "timestamp", '2001-02-16 20:38:40') AS "Month"

Code is:

Some joins %>%
%>%
mutate(date = to_timestamp(timestamp),
Month = extract('MONTH', timestamp, '2001-02-16 20:38:40'))%>%
select(date, Month)%>%
show_query()

EDIT to the above. I worked it through to a resolution.

Interestingly, unlike with dplyr, you cannot seem to add the extract() code within the first mutate (), but adding a second mutate() with the code does the trick. The below, curtsey of @jlacko, converts the timestamp to a date and then adds a Year and a Month variable to the data.

joins %>%
mutate(date = to_timestamp(timestamp))%>%
mutate(Year = sql("EXTRACT(YEAR FROM date)"),
Month = sql("EXTRACT(MONTH FROM date)"))%>%
select(date, Month, Year)%>%
head()

RESULTS:

date Month Year

1 2019-02-25 00:00:00 2 2019
2 2019-03-04 00:00:00 3 2019
3 2019-03-03 00:00:00 3 2019
4 2019-03-04 00:00:00 3 2019
5 2019-03-06 00:00:00 3 2019
6 2019-03-07 00:00:00 3 2019

I am glad you worked that out!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.