Hive ODBC connection with dbplyr Invalid table alias or column reference

I'm connected to Hive using dbplyr and odbc.

A table I would like to connect to is called "pros_year_month":

con <- dbConnect(odbc::odbc(), "HiveProd")
prosym <- tbl(con, in_schema("my_schema_name", "pros_year_month"))

Table pros_year_month has several fields, two of which are "country" and "year_month".

This appears to work without any problem:

pros_nov <- prosym %>% filter(country == "United States") %>% collect()

However this does not:

pros_nov <- prosym %>% filter(year_month = ymd(as.character(paste0(year_month, "01")))) %>% collect()

Error in new_result(connection@ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42000: [Hortonworks][Hardy] (80) Syntax or
semantic analysis error thrown in server while executing query. Error
message from server: Error while compiling statement: FAILED:
SemanticException [Error 10004]: Line 1:7 Invalid table alias or
column reference 'zzz1.year_month': (possible column names are:
year_month, country, ...

It looks like the field name year_month is somehow now zzz1.year_month? Not sure what this is or how to get around it.

How can I apply a filter for country then year_month before calling collect on a dbplyr object?

dbplyr translates your code into an SQL query which is sent to the database.

In this case I think you probably want a mutate, rather than a filter statement. Even then I'm not sure if that would translate into SQL.

You can build up your query in parts checking what the SQL looks like before calling collect via show_query().

Hi Martin. Using show_query(), it's clear right away what part of the issue is. For some reason, the sql is being translated with a prepended string on front of each field:

<SQL> SELECT zzz9.year_month, zzz9.shopper_id,,,

This 'zzz9." is the mystery and lines up with the error message in my original post "Invalid table alias or
column reference 'zzz1.year_month': (possible column names are:

Any idea why zzz9. is being appended to all my field names?

This one is probably for the developers.

Dummy table names do get generated when required, e.g. when joining the table onto itself, but this does not appear to be the case here.

I would still suggest building up the query in parts until you hit a problem so you can isolate where this issue first occurs. As I mentioned above, I don't think your statement would translate into SQL as is (but I may be wrong).

Where does this query come from? First or second example you provided?

As @martin.R said, your second statement is wrong since you, first, need to use mutate and, second, you most likely won't be able to use ymd function (unless it's a function in Hive).

To debug it, can you run show_query on both statements (without collect) and post results here?

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.