dbplyr truncates a character columns to a single "0" value

database
dbplyr
odbc

#1

Hello!
I am getting a weird result from running a piece of dbplyr code.
What I have is a column in a table where the values are strings of 11 characters each, with a bunch of leading zeros, something like this:

acct_no
00000001653
00000001666
00000001693
00000001707
00000001735
00000001746
00000001751
00000001557

I call this table with the following function:

tbl(my_conn, in_schema("my_schema", "my_table")) %>%
select(ACCT_NO)

(select here is for demonstration only, I don't want to carry the rest of the cols into the example)

But the result I get looks like this:

# Source:   lazy query [?? x 1]
# Database: Microsoft SQL Server 13.00.5081[credentials_credentials_credentials]
   ACCT_NO
   <chr>  
 1 0      
 2 0      
 3 0      
 4 0      
 5 0      
 6 0      
 7 0      
 8 0      
 9 0      
10 0  

Furthermore, adding filter(ACCT_NO != "0") seems to be taking no effect at all if put before collect(). But it does work after collect()


It doesn't happen on every table, but it happens on some tables.
On the tables where it doesn't happen, the top values seem to be having fewer leading zeros (not sure if it a coincidence or not)

Has anyone dealt with this before?


#2

Have you inspected the actual sql query to be sure that makes sense? sometimes dbplyr does stranger things during translation.

tbl(my_conn, in_schema("my_schema", "my_table")) %>%
  select(ACCT_NO) %>% 
  show_query()

#3

Good point, I haven't looked at it.
But it's not it. The query is as straightforward as I'd expect:

<SQL>
SELECT *
FROM my_schema.MY_TABLE

And if I add select(ACCT_NO), it still writes it as expected:

<SQL>
SELECT "ACCT_NO"
FROM my_schema.MY_TABLE

#4

Does Microsof SQL server use casting on queries like postgresql? maybe the driver is getting the class wrong and trying to convert to a numeric class (e.g. int8)

SELECT ACCT_NO::varchar
FROM my_schema.MY_TABLE

#5

I guess I could do something like cast(ACCT_NO as char) in SQL, but how do I do that in dbplyr?
I understand I can use something like build_sql() or as.sql() and paste a sting, but how do I execute it against the database, what function do I use?


#6

@edgararuiz, can this be a known issue with the package by chance?
I am cut off from GitHub at work and can't check open issues, but my problem does seem like an unexpected result that shouldn't happen.


#7

Hey, what happens if you run the SQL through DBI::GetQuery()?


#8

Same result: returns all zeros...


#9
SELECT DISTINCT 
ACCT_NO 
FROM my_schema.MY_TABLE

returns an appropriate amount of rows (hundreds thousands, as I'd expect), but all zeros. But it's not just one row of 0 though.


#10

ok, let's try this from SQL Studio, and if you get the correct result there, then it's either the driver or an issue in odbc


#11

It always worked in SQL Studio without a fail.
I am also not sure about the driver, as it is the same driver used by Tableau, and I've never had this problem in Tableau.

I guess it is an odbc issue after all.
Thank you!


#12

Just as a comment, I have had issues lately with odbc() and non English locale, after a connection is made my decimal mark gets changed to "," and that causes all my numeric values get converted to NA if I apply any dplyr command afterwards


#13

This seems to be fixing my issue:

Basically, an explicit cast(field, varchar(8000)) was needed.
Not sure how to incorporate this into dbplyr workflow, but at least I can use a SQL vanilla query and run it with dbGetQuery