R SQL: Inner Joins with Different Date Formats (HY000 Error: Bad External Date Representation)

I am working with the R programming language.

  • I have the two following tables (located on a server): table_1 and table_2
  • table_1 has a date variable " Date_1 " that has a variable type " character_varrying(255) "
  • table_2 has two date variables " Date_2 " and " Date_3 " that have variable types " DATE "
  • All 3 dates have the same form : 2010-01-01

I am trying to run the following join over a server:

#load libraries

library(OBDC)
library(RODBC)
library(dbi)

#establish a connection and name it as "dbhandle"

dbGetQuery(dbhandle, "create table final_table as select * from table_1 a
inner join table_2 b
On (( a.date_1 between b.date_2 and b.date_3) and a.id = b.id)
 Or (a.id1 = b.id1) ")

Problem: When I run this above code, I get the following error:

HY000 Error: Bad External Date Representation

My Question: I think this problem is happening because the date variables being used in the join have different variable types?

If the tables were locally stored, I could have manually changed the date formats:

table_1$Date_1 = as.Date(table_1$Date_1, ...)

However, since these tables are located on a server and I am interacting with these tables using commands from the "dbi" library (DBI: R Database Interface version 1.1.2 from CRAN), I am not sure how to correctly "cast" all dates into the same format so that the inner join can run.

Can someone please show me how to do this?

Thanks!

Note: "Date_1" has some missing values which appear as "empty rows" - could this also be a problem?

This is not R related, you are sending the sql query from R, but the type casting has to be done in the sql statement, how to do it depends on the specific sql dialect of your RDBMS, for example on Postgresql you can cast column types with the :: operator e.g. Date_1::date

The error code suggests an IBM Netezza database,
so research casts in that ?
Cast conversions - IBM Documentation

1 Like

There is probably a SQL workaround to this, but since we're talking R, you could always query both tables separately, adjust their variable types with code, then join them using the R inner_join function, available in the tidyverse.

table_1 <- dbGetQuery(dbhandle,"select * from table_1")
table_2 <- dbGetQuery(dbhandle,"select * from table_2")

table_1$Date_1 <- as.Date(table_1$Date_1,"Whatever format it's in goes here")

table_12 <- inner_join(table_1,table_2,by=c("id"))
table_12 <- table_12 %>%
     filter(Date_1 >= Date_2 & Date_1 <= Date_3)

There are definitely some SQL purists out there cursing my name right now, but if you need to do it in R, this is the way, or at least something similar to this. I don't know the formatting of your tables, so this will probably need to be modified.

1 Like

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.