How to extract time fields from POSIXct database field with dplyr/dbplyr?

dplyr

#1

I’m using dplyr/dbplyr to extract and parse database fields from a large table.

I want to manipulate a POSIXct date time field, but don’t see how to extract the time fields.

Consider:

. . .
  select(LAB_COMPLETED_DT_TM)  %>%
  mutate(take2  = LAB_COMPLETED_DT_TM,
         take3  = as.Date(LAB_COMPLETED_DT_TM),
         take4  = as.character(LAB_COMPLETED_DT_TM),
         YYYY = substr(as.character(LAB_COMPLETED_DT_TM),1,4))  %>%
  head(5)

Example output:

take2 <S3:POSIXct>: 2015-11-09 03:10:00
take3 <chr>: 2015-11-09
take4 <chr>: Nov 9 2015 3:10AM
YYYY <chr>: Nov

How do I get take4 to be the exact same string as shown for take2, so I can use substr to get any part of the datetime field that I want, such as YYYY being the year?

DATE(LAB_COMPLETED_DT_TM) or as.Date(LAB_COMPLETED_DT_TM) gets the date, but how does one extract the time?

What determines the format used by as.character to convert the datetime to a string? Why not ISO8601?

Since SQL is being created, many “normal” R conversion functions do not work here (in my hands).

Relevant portion of the SQL generated when using show_query instead of head above:

<SQL>
SELECT "LAB_COMPLETED_DT_TM", "LAB_COMPLETED_DT_TM" AS "take2", CAST("LAB_COMPLETED_DT_TM" AS DATE) AS "take3", CAST("LAB_COMPLETED_DT_TM" AS VARCHAR(MAX)) AS "take4", SUBSTRING(CAST("LAB_COMPLETED_DT_TM" AS VARCHAR(MAX)), 1.0, 4.0) AS "YYYY"

Is there a way to do this with dplyr, or must I use CAST and CONVERT in TSQL to accomplish what I want?


#2

I'm afraid I'm not 100% certain of exactly how to get the same character string, but if you want to do date/time processing in R (rather that in SQL) I'd recommend the lubidate and hms packages for dates and times, respectively.

E.g.

hms::hms(lubridate::now())
#> 12:05:04.350296

You could then convert to character if you need to extract specific parts of the date/time string, or use the same packages to handle to dates/times as-is.


#3

Thanks for the suggestion for lubridate.

In this case, I'm trying to use dbplyr to do the "heavy lifting" on the database server since I'm extracting 10s of millions of rows from several billion. I need to get only the data I need and no more if at all possible.


#4

Hi, you can use the vendor specific SQL command that parses the year inside mutate. So you can use mutate(takeN = year(date_field). What dbplyr does, is that since it doesn't recognize the year command, it passes it to the SQL engine as-is.


#5

Thanks for the feedback, Edgar. I accidentally discovered I could use DAY, MONTH and YEAR function that TSQL provides, but for some reason Microsoft does not provide functions to get HOUR, MINUTE, SECOND -- at least I can't find them under "Functions That Return Date and Time Parts" on this page: https://docs.microsoft.com/en-us/sql/t-sql/functions/date-and-time-data-types-and-functions-transact-sql?view=sql-server-2017.

The section "G. Using CAST and CONVERT with datetime data" shows some promise, but I'll have to go back to SQL. https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017#date-and-time-styles. I don't see how I can do those CASTs and CONVERTs via dbplyr.

In my case, I don't need date functions. I can join to a date dimension table and get all sorts of info about the date. But without using SQL I don't see how to get the time values.

dbplyr is so close to providing a solution, if only the datetime conversion to a string were ISO8601 format.


#6

Oh, and DATEPART won't do what you need? https://docs.microsoft.com/en-us/sql/t-sql/functions/datepart-transact-sql?view=sql-server-2017


#7

That does look like an excellent suggestion, and I can get it to work as expected in Microsoft SQL Server Management Studio:

SELECT DATEPART(hour,  '2015-11-09 03:10:00')
3
SELECT DATEPART(minute,'2015-11-09 03:10:00')
10
SELECT DATEPART(second,'2015-11-09 03:10:00')
0

But I don't understand where hour, minute and second are defined?

Strings enclosed in double quotes also work, but not strings in single quotes:

SELECT DATEPART("hour",  '2015-11-09 03:10:00')
SELECT DATEPART("minute",'2015-11-09 03:10:00')
SELECT DATEPART("second",'2015-11-09 03:10:00')

Either works (no quotes or double quotes) in RStudio (assuming valid TSQL connection HF):

Note: I replaced back-ticks in the code chunks below with quotes so preformatted text blocks will work.

No Quotes

' ' '{sql, connection="HF"}
SELECT DATEPART(hour,  '2015-11-09 03:10:00')
' ' '

Double quotes

' ' '{sql, connection="HF"}
SELECT DATEPART("hour",  '2015-11-09 03:10:00')
' ' '

Single Quotes

Error: 'SELECT DATEPART('hour', '2015-11-09 03:10:00')'
nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][ODBC SQL Server Driver][SQL Server]Invalid parameter 1 specified for datepart.
Failed to execute SQL chunk

dplyr/SQL attempts

I can't get this to work with dplyr with or without quotes since hours always gets the single quotes in SQL that do not work:

dplyr: take4  = DATEPART(hour, LAB_COMPLETED_DT_TM)
sql:  Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "function"
dplyr:  take4  = DATEPART("hour", LAB_COMPLETED_DT_TM) 
sql:   DATEPART('hour', "LAB_COMPLETED_DT_TM") AS "take4"
dplyr: take4  = DATEPART('hour', LAB_COMPLETED_DT_TM)
DATEPART('hour', "LAB_COMPLETED_DT_TM") AS "take4"

Is there a way to get "hour" instead of 'hour' in the SQL DATEPART call from dplyr?


#8

Not ideal, but would you mind trying:

take4 = DATEPART(sql("hour"), LAB_COMPLETED_DT_TM)

And if that doesn't work:

take4 = DATEPART(!! sql("hour"), LAB_COMPLETED_DT_TM)


#9

Forgot to mention, that for the second one, you'll need to load rlang: library(rlang) first


#10

Eureka! Both ways worked, gave the same results, and the same SQL:

dplyr

take4 = DATEPART(sql("hour"), LAB_COMPLETED_DT_TM),
take5 = DATEPART(sql("minute"), LAB_COMPLETED_DT_TM),
take6 = DATEPART(sql("second"), LAB_COMPLETED_DT_TM),

SQL

DATEPART(hour, "LAB_COMPLETED_DT_TM") AS "take4", 
DATEPART(minute, "LAB_COMPLETED_DT_TM") AS "take5", 
DATEPART(second, "LAB_COMPLETED_DT_TM") AS "take6"

THANK YOU for your help and your extremely helpful videos about best practices for working with databases!