Using multiple R variables in SQL chunk


#1

Is it possible to pass multiple R variables into R Notebook SQL chunk? I can do so with one R variable by prefacing it with a ? sign (http://rmarkdown.rstudio.com/authoring_knitr_engines.html#sql). But I can’t figure out, how to do it with multiple variables.


#2

Hi @toomase , the variable should be a single character instead of a multiple value vector. So, it should work with multiple <- "'one', 'two'" . A better choice would be to use build_sql() which takes care of the quoting of the values in the vector for you:

multiple <- c("one", "two")
multiple <- build_sql(multiple)

returns:

multiple
<SQL> ('one', 'two')

#3

Hi @edgararuiz. Thank you for the answer. I tried both of the suggested workarounds, but unfortunately with no luck. Do you have any ideas what am I doing wrong? Also how could I pass multiple numeric variables to the SQL where clause?


#4

My bad, I forgot to mention that the values that in evaluates should be inside a parenthesis, so the SQL statement should read something like: select * from df where name in (?multiple2)


#5

Thanks @edgararuiz
I tried also with parenthesis, but the SQL chunk results 0 rows. When I explicitly write into the SQL where name IN ('one', 'two') , then the result is correct with two rows.

32


#6

So a few things here. This may differ depending on the DBMS you are using. (I.e. in Postgres, it might be possible to use string manipulation or arrays to work around this). You can also obviously build a query string yourself and use it in an R chunk (while exposing yourself to SQL injection :slight_smile:).

But to your question, it seems to me that the generated SQL is being interpreted as a character, as shown here:

And differs little with parentheses:
image

That would explain why you get no rows back. Because name = “(‘one’, ‘two’)” is not true for any record.

Further, the error you are getting with multiple_1 (in my code) seems to suggest that the SQL chunk wants a single object.

Now, this could be done using multiple parameters (if you know you have a finite and unchanging number of parameters):

But the question is about using the array itself, and the only way I know to do that (besides with a generated query in an R chunk or array / string splitting trickery in the database) is with dplyr:

And the query that dplyr generates for you:

<SQL>
SELECT *
FROM `df`
WHERE (`name` IN ('one', 'two'))

It may be just a result of my ignorance, but this seems to me like a feature request! For the database back-ends themselves (for parameterized queries), as well as for R Notebooks. (From the discussion here, it seems this feature is lacking in the RPostgres package, at least. My comment concerned arrays, but had a similar aim. I never actually made the feature request, though).

One parting thought is that you could always populate a temp table with the “lookup” values that you want… but that is not a huge help. Just a lot of different ways to work-around :smiley:


#7

@cole or @edgararuiz Is there a feature request for this for SQL code chunks? I get that dplyr handles this well, but it is very limiting for SQL code chunks in RMarkdown to only accept single length variables. I’m pretty sure the reason this happens is that the knitr language engine prevents SQL injection by via the sqlInterpolate function, which in turn uses dbQuoteString. This is good protection, but it does not play nicely with the sorts of cases considered here (which I think will be common).

@cole the thread you linked to was about something related but different. @edgararuiz do you think the FR here should go to knitr (where the language engine is defined) or to DBI (which powers dbQuoteString and sqlInterpolate?

Thanks!

Aaron


#8

Hey @agberg, I believe that @krlmlr is already working on that for DBI, I remember seeing this slide at his presentation at rstudio::conf: https://krlmlr.github.io/rstudio-conf18/#4 . Maybe he can tell us if this would implement what you are asking for.


#9

Sequences of values are not currently supported in DBI::dbQuoteLiteral(). Where else would it be useful, besides of the IN operator?


#10

I guess it would be useful for other functions that expect listed values, for example CONCAT (R equivalent of paste0).


#11

@krlmlr Is there an issue in DBI open for this?


#12

Not yet. I'm not sure this can be supported at all with RSQLite, RMariaDB or RPostgres. This depends on the API of the libraries used internally, I have skimmed through the sqlite3 and MariaDB APIs and haven't found anything that hints at the possibility to pass arrays. Unfortunately, this means that this particular feature is out of scope for DBI.

How about creating a temporary table, filling it with values, and performing a join?


#13

I had the same problem today and I think I found a solution using glue_sql() from the {glue} :package:.

To quote the glue_sql() help page:

If you place a * at the end of a glue expression the values will be collapsed with commas. This is useful for the SQL IN Operator for instance.

Now, this sounds useful indeed :wink:

grafik

And here is the rmarkdown code:

```{r}
library(DBI)
library(glue)
library(RSQLite)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
dbWriteTable(con, "mtcars", mtcars, overwrite = TRUE)

cyl_int <- c(4L, 6L)
cyl_sql <- glue_sql("{cyl_int*}")

```

```{sql, connection=con}
  SELECT * FROM mtcars
  WHERE cyl IN (?cyl_sql)
  LIMIT 3
```

#14

PS: For a character vector to work, one needs to provide the .con argument glue_sql(), I think, so e.g.

glue_sql("{c('setosa', 'virginica')*}", .con = con)

if you use iris instead of mtcars.


#15

Thanks a lot, @dpprdan - was able to replicate your solution approach for my case with MS SQL and it worked flawlessly.


#16

Thanks @dpprdan. I tried out your solution with PostgreSQL and it worked just great.


#17

If your question's been answered, would you mind choosing a solution? It helps other people see which questions still need help, or find solutions if they have similar problems. Here’s how to do it: