Passing variables between r and sql scripts while still being able to use script preview?

I am working on a sql file within rstudio. This allows me to build and run sql using the preview feature.

r script to create a con

library(dplyr)
library(dbplyr)
library(glue)
iris_db <- tbl_memdb(iris)
con <- src_memdb()$con

Then I can run queries using sql:
example_query.sql

-- !preview conn=con

select * 
from iris 
where Species = 'setosa'

Within my .sql script, if I press the preview button, I can view the results of my query. This is handy for developing larger queries, so that I can test and check along the way if the data are as expected.

On a recent post on here I learned how to read in the .sql script into r:

query <- read_lines("example_query.sql") %>% glue_collapse(sep = "\n") %>% glue_sql(.con = con)
setosa <- dbGetQuery(con, query)
setosa %>% glimpse()
Observations: 50
Variables: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.8, 4.8, 4.3, 5.8, 5.7, 5.4, 5.1, 5.7, 5.1, 5.4, 5.1, …
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4, 3.0, 3.0, 4.0, 4.4, 3.9, 3.5, 3.8, 3.8, 3.4, 3.7, …
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.6, 1.4, 1.1, 1.2, 1.5, 1.3, 1.4, 1.7, 1.5, 1.7, 1.5, …
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.2, 0.1, 0.1, 0.2, 0.4, 0.4, 0.3, 0.3, 0.3, 0.2, 0.4, …
$ Species      <chr> "setosa", "setosa", "setosa", "setosa", "setosa", "setosa", "setosa", "setosa", "setosa", "setosa", "setosa", …

I also learned that I can pass variables from r to the sql script using the glue package along with curly braces e.g.

in my r script add

var1 <- "setosa"

Then in my sql script change where Species = 'setosa' to where Species = {var1}

However, when I switch to using {variables} in the sql query, I am no longer able to use the preview function to view my results. This can actually be done in a Rmd file when working on a sql chunk by using ?syntax to refer to variables within the chunk and then pressing play.

Is there a way to both be able to use r variables within a sql script AND use preview to run the query?

So the preview function in RStudio sends the statement to DBI::dbSendQuery, which means you can add parameters to the script after the connection.

I believe the Preview function uses RSQLite, so you can used named matching (see https://dbi.r-dbi.org/reference/dbbind#details) to add them.

-- !preview conn=con, params=list(var1=var1)
select * 
from iris 
where Species = $var1 

You can also used more multiple conditions with this format. For example, if

var1 <- c("setosa", "versicolor", "unknown")

, then the preview would return the desired results.

The only limitations occur when you're trying to use multiple parameters.

For example, if var1 <- c("setosa", "versicolor", "unknown") and var2 <- 1.4, the script below would return an error in Preview, because var1 and var2 must be equal length (see the link above).

-- !preview conn=con, params=list(var1=var1, var2=var2)
select * 
from iris 
where Species = $var1 AND `Petal.Length` > $var2

You can get around that by making var2 the same length as var1 within the sql script

-- !preview conn=con, params=list(var1=var1, var2=rep(var2, length(var1)))
select * 
from iris 
where Species = $var1 AND `Petal.Length` > $var2
1 Like

Thanks again @jdb, I gave that a try with my actual query script:

in r:

game_name <- "fungame"
day_n <- 7 # train from
day_m <- 30 # predict to

Then in sql script:

-- !preview conn=con, params=list(day_n=day_n, day_m=day_m, game_name=game_name)


select s, 
       install_dt, 
       split(game_name, '_')[2] as platform,
       case when country = 'United States' then 1 else 0 end as usa
from device_metrics.game_install
where year || '-' || month || '-' || day >= date_format(date_add('day', -($day_m + 91), current_date), '%Y-%m-%d')
and year || '-' || month || '-' || day <= date_format(date_add('day', -($day_m + 1), current_date), '%Y-%m-%d')
and lower(game_name) = $game_name

Gives:

Error in new_result(connection@ptr, statement, immediate) :
nanodbc/nanodbc.cpp:1374: 00000: [Simba][Athena] (1040) An error has been thrown from the AWS Athena client. Athena Error No: 130, HTTP Response Code: 400, Exception Name: InvalidRequestException, Error Message: line 9:75: no viable alternative at input '($' [Execution ID: ]

That's after clicking on preview. I get the same error when pulling into r:

query <- read_lines("blah.sql") %>% glue_collapse(sep = "\n") %>% glue_sql(.con = con)
exdata <- dbGetQuery(con, query)

Here's what the query looks like:

query
<SQL> -- !preview conn=con, params=list(day_n=day_n, day_m=day_m, game_name=game_name)


select s, 
       install_dt, 
       split(game_name, '_')[2] as platform,
       case when country = 'United States' then 1 else 0 end as usa
from device_metrics.game_install
where year || '-' || month || '-' || day >= date_format(date_add('day', -($day_m + 91), current_date), '%Y-%m-%d')
and year || '-' || month || '-' || day <= date_format(date_add('day', -($day_m + 1), current_date), '%Y-%m-%d')
and lower(game_name) = $game_name

So it looks like those variables $day_n, $day_m and $game_name are not being translated. Any ideas? I'm actually using odbc con.

Using the Preview SQL capability in RStudio, I'm not sure if there is a way to use variable substitution. See here for a brief discussion:

But, what I usually do is develop my SQL queries in another editor (I happen to use Toad with our Oracle databases at work) using something like &var for substitution in Toad. But often I want to use this same query as part of an R script and use glue_sql() to substitute the parameters.

So I have a little function that reads the SQL file and replaces &var with {var} so I can then use glue_sql(). It looks something like this:

library(dbplyr)
library(magrittr) 
library(glue)
library(readr)
library(DBI)

tmp_sql <- tempfile()
my_sql <-
  "select *
from iris
where Species = &var1 
"
write_file(my_sql, tmp_sql)

iris_db <- tbl_memdb(iris)
con <- src_memdb()$con
var1 <- "setosa"
read_sql <- function(path) {
  read_lines(path) %>% 
    glue_collapse(sep = "\n") %>% 
    gsub(pattern = "&", x = ., replacement = "{") %>% 
    gsub(pattern = "\\{[\\S]+\\K\\s", x = ., 
         replacement = "} ", perl = TRUE)

}

query <- read_sql(tmp_sql)
query
#> select *
#> from iris
#> where Species = {var1}

query_glued <- glue_sql(query, .con = con)
query_glued
#> <SQL> select *
#> from iris
#> where Species = 'setosa'

setosa <- dbGetQuery(con, query_glued)
head(setosa)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa

Created on 2020-02-11 by the reprex package (v0.3.0)

2 Likes

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.