Using in_schema with dbplyr, cannot refrrence feature names

dougfir · January 14, 2019, 11:08pm

Using ODBC, I have a connection to our company Hive instance:

library(odbc)
library(tidyverse)
library(dbplyr)

# hive connection
con <- dbConnect(odbc::odbc(), "HiveProd")

# tables
blah <- tbl(con, in_schema("gcameron", "web_pros_year_month"))

Here's a screen if I glimpse() blah:

I am able to filter based on a feature name, e.g.

blah %>% filter(country == "United States") %>% head()
# Source:   lazy query [?? x 72]
# Database: Hive 1.2.2[@Hive/HIVE]
  web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m… web_pros_year_m…
             <int> <chr>            <chr>            <chr>                       <int>            <int>            <int>            <int>            <int>
1           201408 2390714          Unknown          United States              200308           201408                0                0                0
2           201408 2694435          Unknown          United States              200310           201408                0                0                0
3           201408 4184585          Unknown          United States              200411           201408                0                0                0
4           201408 40568007         Unknown          United States              201012           201408                0                0                0
5           201408 48500123         Unknown          United States              201112           201408                0                0                0
6           201408 56980782         Unknown          United States              201211           201408                0                0                0
# ... with 63 more variables: web_pros_year_month.count_sites_pro_sites <int>, web_pros_year_month.total_clients_delegated_in_month <int>,
#   web_pros_year_month.active_delegations <int>, web_pros_year_month.domainer <lgl>, web_pros_year_month.premier_hosting_customer <lgl>,
#   web_pros_year_month.count_reseller_accounts <int>, web_pros_year_month.count_calls <int>, web_pros_year_month.total_call_time_secs <dbl>

Everything works as expected. However, filter appears to be the only dplyr verb I can use with feature names in this way. Example, select:

blah %>% select(channel:country) %>% head()
Error in is_character(x, encoding = encoding, n = 1L) : 
  object 'channel' not found

Or just a single feature:

blah %>% select(country) %>% head()
Error in .f(.x[[i]], ...) : object 'country' not found

I tried adding in the schema name, since glimpse() includes the schema name int he field names:

blah %>% select(web_pros_year_month.country) %>% head()
Error in .f(.x[[i]], ...) : 
  object 'web_pros_year_month.country' not found

If I try to prompt Rs auto complete using $ I get what appears to be a random string prepending the features:

From a previous post I learned that this is the temp query that holds blah.

I'm aware that I could just send raw sql via odbc and pull into r studio that way, however preference would be to make use of tidyverse functions like dplyr verbs on several tables. This would allow me to store tbls as variables, perform various joins and filters before dbplyr pulls everything in using collect().

Is this a bug or am I just using the package incorrectly? It's odd that things work fine for a single dplyr verb, filter(), but not the others such as select.

dougfir · January 18, 2019, 9:53pm

Since I'm not sure if this is a bug or if I'm just using the package incorrectly, would it be right for me to add this as an issue on Github?

system · February 8, 2019, 9:53pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.