dbplyr/dplyr filter worked fine in R 3.5.1 but fails in R 3.6.0 using %in% in database query

Given:

> procedureIDs$ProcedureID
[1] 8607 8608

Does anyone have any ideas about why does this filter construct worked fine in R 3.5.1 From a somewhat complicated dbplyr/dplyr statement querying an MS SQL database:

… %>%

filter(PROCEDURE_ID %in% procedureIDs$ProcedureID)        %>%  

But gives this error message in R 3.6.0?

Error in UseMethod("escape") : no applicable method for 'escape' applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
36. escape(val, con = con)
35. .f(.x[[i]], ...)
34. purrr::map_chr(enexprs(...), escape_expr, con = con)
33. build_sql(x, sql(f), y)
. . .

This is due to a breaking change in last dbplyr version. See second bullet point
https://dbplyr.tidyverse.org/news/index.html#breaking-changes

$ will be evaluated into the database now. You need to force evaluation in R here with !!.

filter(PROCEDURE_ID %in% !! procedureIDs$ProcedureID)  

There is some other examples in this issue discussing this change

2 Likes

so for my personal junk, using !! is fine. I can read and understand that. But if I'm teaching, I'm not inclined to teach !! to new users. So is a reasonable practice to put the $ delimited column in a vector and then use that vector in the filter? Something like this:

myIDs <- procedureIDs$ProcedureID
filter(PROCEDURE_ID %in% myIDs)  
2 Likes

Yes, it'll work:

library("tidyverse", warn.conflicts = FALSE)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang

iris_db <- dbplyr::tbl_memdb(iris)

species <- list()
species$species <- c("setosa")

# doesn't work
iris_db %>%
  dplyr::filter(Species %in% species$species)
#> Error in result_create(conn@ptr, statement): near "AS": syntax error

# works
iris_db %>%
  dplyr::filter(Species %in% !!species$species)
#> # Source:   lazy query [?? x 5]
#> # Database: sqlite 3.22.0 [:memory:]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with more rows

# works
just_vector <- species$species
iris_db %>%
  dplyr::filter(Species %in% just_vector)
#> # Source:   lazy query [?? x 5]
#> # Database: sqlite 3.22.0 [:memory:]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with more rows

Created on 2019-05-25 by the reprex package (v0.3.0)

1 Like

Yes it will work fine and should be Advised I think.

What is to understand here is that ˋdbplyrwon’t guess now what is R code and what is not. So&ˋ and [ won’t be evaluate in R.

Using a variable for those expressions is the right way to do it I think.

1 Like

THANKS @cderv for the great and timely fix and @jdlong for the suggestion. I verified the solution works on RStudio locally on Windows and on a RStudio server on Linux. I'll need to fix a dozen or two scripts, but that's easy now that I know the cause.

In my case, I thought the expression procedureIDs$ProcedureID was a local vector as part of a local tibble -- it's not in the database. The suggestion by @jdlong to make it a separate local vector probably makes more readable code, so I'll use that most of the time.

2 Likes

worth noting here we can also use local() which might be more intuitive to folks not used to NSE and !!!:


library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
iris_db <- dbplyr::tbl_memdb(iris)

species <- list()
species$species <- c("setosa")

iris_db %>%
  filter(Species %in% local(species$species))
#> # Source:   lazy query [?? x 5]
#> # Database: sqlite 3.22.0 [:memory:]
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <chr>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with more rows

Created on 2019-06-13 by the reprex package (v0.3.0)

4 Likes