Using ODBC, I have a connection to our company Hive instance:
library(odbc)
library(tidyverse)
library(dbplyr)
# hive connection
con <- dbConnect(odbc::odbc(), "HiveProd")
# tables
blah <- tbl(con, in_schema("gcameron", "web_pros_year_month"))
Here's a screen if I glimpse() blah:
I am able to filter based on a feature name, e.g.
blah %>% filter(country == "United States") %>% head()
# Source: lazy query [?? x 72]
# Database: Hive 1.2.2[@Hive/HIVE]
web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m…
<int> <chr> <chr> <chr> <int> <int> <int> <int> <int>
1 201408 2390714 Unknown United States 200308 201408 0 0 0
2 201408 2694435 Unknown United States 200310 201408 0 0 0
3 201408 4184585 Unknown United States 200411 201408 0 0 0
4 201408 40568007 Unknown United States 201012 201408 0 0 0
5 201408 48500123 Unknown United States 201112 201408 0 0 0
6 201408 56980782 Unknown United States 201211 201408 0 0 0
# ... with 63 more variables: web_pros_year_month.count_sites_pro_sites <int>, web_pros_year_month.total_clients_delegated_in_month <int>,
# web_pros_year_month.active_delegations <int>, web_pros_year_month.domainer <lgl>, web_pros_year_month.premier_hosting_customer <lgl>,
# web_pros_year_month.count_reseller_accounts <int>, web_pros_year_month.count_calls <int>, web_pros_year_month.total_call_time_secs <dbl>
Everything works as expected. However, filter appears to be the only dplyr verb I can use with feature names in this way. Example, select:
blah %>% select(channel:country) %>% head()
Error in is_character(x, encoding = encoding, n = 1L) :
object 'channel' not found
Or just a single feature:
blah %>% select(country) %>% head()
Error in .f(.x[[i]], ...) : object 'country' not found
I tried adding in the schema name, since glimpse() includes the schema name int he field names:
blah %>% select(web_pros_year_month.country) %>% head()
Error in .f(.x[[i]], ...) :
object 'web_pros_year_month.country' not found
If I try to prompt Rs auto complete using $ I get what appears to be a random string prepending the features:
From a previous post I learned that this is the temp query that holds blah.
I'm aware that I could just send raw sql via odbc and pull into r studio that way, however preference would be to make use of tidyverse functions like dplyr verbs on several tables. This would allow me to store tbls as variables, perform various joins and filters before dbplyr pulls everything in using collect().
Is this a bug or am I just using the package incorrectly? It's odd that things work fine for a single dplyr verb, filter(), but not the others such as select.