I'm connected to Hive using dbplyr and odbc.
A table I would like to connect to is called "pros_year_month":
library(odbc)
library(tidyverse)
library(dbplyr)
con <- dbConnect(odbc::odbc(), "HiveProd")
prosym <- tbl(con, in_schema("my_schema_name", "pros_year_month"))
Table pros_year_month has several fields, two of which are "country" and "year_month".
This appears to work without any problem:
pros_nov <- prosym %>% filter(country == "United States") %>% collect()
However this does not:
pros_nov <- prosym %>% filter(year_month = ymd(as.character(paste0(year_month, "01")))) %>% collect()
Error in new_result(connection@ptr, statement) :
nanodbc/nanodbc.cpp:1344: 42000: [Hortonworks][Hardy] (80) Syntax or
semantic analysis error thrown in server while executing query. Error
message from server: Error while compiling statement: FAILED:
SemanticException [Error 10004]: Line 1:7 Invalid table alias or
column reference 'zzz1.year_month': (possible column names are:
year_month, country, ...
It looks like the field name year_month is somehow now zzz1.year_month? Not sure what this is or how to get around it.
How can I apply a filter for country then year_month before calling collect on a dbplyr object?