Use a sql script in a r script

Consider the following code block:

library(dplyr)
library(dbplyr)
iris_db <- tbl_memdb(iris)
con <- src_memdb()$con

query <- "select * from iris where Species = 'setosa'"
setosa <- dbGetQuery(con, query)

If I run this, I get a data frame back with just the setosa rows from iris. So this works fine.

However, I'd like to include some variables within my script e.g.

species <- 'setosa'
query2 <- paste0("select * from iris where Species = '", species, "'")
setosa2 <- dbGetQuery(con, query2)

This also works, a new df setosa2 is returned just like with my first attempt where I hard coded the species to be returned.

However, in my actual script there are many variables I would like to use within the sql query, so it would get unsightly and hard to read the sql query after not too long.

I really enjoy working with .sql files within rstudio and recently posted a similar question within the context of integrating a .sql file and .Rmd file.

Within an Rmd file one can refer to a sql code chunk and integrate variables using question marks e.g.

```{sql connection=con, include=FALSE, output.var="my_df"}
select * from iris where Species = ?species
```

My question is, is there a workflow where I can integrate an external sql script within to a r script (as opposed to a Rmd) file? With the Rmd file one can use readLines but I think that only works when calling knit.

Or must I just use paste() with a long string and variables to generate my query and pass to r?

This is not the case, you can read your file in exactly the same way from a script.

And I would strongly advise from using paste to create your queries. You can read about what you should do instead here - Solutions - Run Queries Safely

2 Likes

Thanks @mishabalyasin!

"you can read your file in exactly the same way from a script."

I don't follow? In the example above, what would that look like? If I have my query saved in query.sql, would it be a case of something like:

query <- source('my_query.sql')

I tried that:

query <- source("my_query.sql")
Error in source("my_query.sql") : 
  exampl_query.sql:4:6: unexpected symbol
3: select * 
4: from iris

Ah! I think I see it now. Use:

query <- read_lines("my_query.sql")

Then I can add variables to the sql per the link that you sent

1 Like

A follow up if I may. I read the doc you shared. Mimicking the functionality of passing variables to sql within sql chunks in a Rmd file, is something like the following possible:

query.sql:

select * 
from iris 
where Species = ?var1
and Sepal.Length > ?var2

main.R:

var1 <- "setosa"
var2 <- 5
query <- read_lines("example_query.sql")
setosa <- dbGetQuery(con, query)

When I run main.R:

Error: Expecting a single string value: [type=character; extent=5].

Is there a way to pass the variables directly like this?

The way knitr accomplishes this in R Markdown is using sqlInterpolate, which is also referenced in the link shared.

This works for me:

query <- 'select * 
from iris 
where Species = ?var1
and "Sepal.Length" > ?var2'

var1 <- "setosa"
var2 <- 5

sql <- DBI::sqlInterpolate(con, 
                      query, 
                      var1 = var1,
                      var2 = var2
)

setosa <- DBI::dbGetQuery(con, sql)
1 Like

Thank you @jdb! I'm reading about glue too: https://github.com/tidyverse/glue

Example from that page:

library(glue)
name <- "Fred"
glue('My name is {name}.')

If I could integrate that simplicity into a sql script that would be swell. I'll update here if I succeed. Failing that I'll try with sqlInterpolate()

Here's how my code currently looks:
example_query.sql:

select * 
from iris 
where Species = {var1}

Then my script:

library(dplyr)
library(dbplyr)
library(glue)
iris_db <- tbl_memdb(iris)
con <- src_memdb()$con

var1 <- "setosa"
query <- glue_sql(read_lines("example_query.sql"))
setosa <- dbGetQuery(con, query)

Gives:

Error: All unnamed arguments must be length 1

Tried:

  1. Tried removing read_lines and checking if glu_sql() can read from a script directly, apprently it cannot
  2. Tried wrapping backticks around the variable in the .sql file, same error message

I feel like I'm close. I prefer this route since I can use a .sql file rather than a query string like you have above. Any ideas?

The error occurs because read_lines returns a vector of length greater than 1. You can use glue_collapse to get the string you need.

You will also need to specify the .con argument in glue_sql.

The end result:

library(dplyr)
library(dbplyr)
library(glue)

iris_db <- tbl_memdb(iris)
con <- src_memdb()$con

var1 <- "setosa"

query_lines <- readr::read_lines("example_query.sql")
query_lines

query_collapsed <- glue_collapse(query_lines)
query_collapsed

query <- glue_sql(.con = con, query_collapsed)
setosa <- DBI::dbGetQuery(con, query)

head(setosa)
4 Likes

Thanks @jdb! However I'm having an issue with the variable:

example_query.sql:

select * 
from iris 
where Species = {var1}

r script:

library(dplyr)
library(dbplyr)
library(glue)
iris_db <- tbl_memdb(iris)
con <- src_memdb()$con
var1 <- "setosa"
query <- read_lines("example_query.sql") %>% glue_collapse()
setosa <- dbGetQuery(.con = con, query)

Results in

Error: unrecognized token: "{"

I tried some variations with query.sql:
where Species = {`var1`} // same error
where Species = `var1` // same error
where Species = ?var1 // Error: near "var1": syntax error
where Species = {?var1} // Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘dbGetQuery’ for signature ‘"glue", "missing"’

How can I get the variable part to read?

On the sql script itself, when I use the preceding question mark syntax, it formats it by greying out which is convenient, per the screen shot. So that 'feels' like how it's meant to be. But I'm not sure.

Screen Shot 2020-02-10 at 1.09.48 PM

Looks to me like you're having a few issues here. The first may be around how you are using glue_collapse() to parse your .sql file.

Check out the result of the following from your code:

read_lines("example_query.sql") %>%
  glue_collapse()
#> select *from iriswhere Species = {var1}

This is an invalid SQL statement because you lose the space between iris and where. But, if you do something like this, you get a nicely formatted query

read_lines("example_query.sql") %>% 
  glue_collapse(sep = "\n")
#> select *
#> from iris
#> where Species = {var1}

Then the next step you need to do is to replace the contents of {var1} with a variable in your R session. This is where glue_sql() comes in:

var1 <- "setosa"
read_lines("example_query.sql") %>% 
  glue_collapse(sep = "\n") %>% 
  glue_sql(.con = con)
#> <SQL> select *
#> from iris
#> where Species = 'setosa'

So bringing it all together, the following code should work if your example_query.sql is:

select *
from iris
where Species = {var1}
library(dbplyr)
library(magrittr) 
library(glue)
library(readr)
library(DBI)

iris_db <- tbl_memdb(iris)
con <- src_memdb()$con
var1 <- "setosa"
query <- read_lines("example_query.sql") %>% 
  glue_collapse(sep = "\n") %>% 
  glue_sql(.con = con)

setosa <- dbGetQuery(con, query)
head(setosa)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.1         3.5          1.4         0.2  setosa
#> 2          4.9         3.0          1.4         0.2  setosa
#> 3          4.7         3.2          1.3         0.2  setosa
#> 4          4.6         3.1          1.5         0.2  setosa
#> 5          5.0         3.6          1.4         0.2  setosa
#> 6          5.4         3.9          1.7         0.4  setosa
3 Likes

This works! Thanks a lot. One follow up, but I think I'll open a new post on that follow up. But I'll ask it here first anyway just in case.

In my work environment I have a .sql file and a .r file. My motivation for doing the above was that I could build the sql in the .sql file using r variables and then, when ready, pass the sql results into a data frame in the r script by per your response above.

When within the .sql file there is a preview option which runs the sql. However, this does not work with curly brace syntax for {var1}.

i.e. it's a very appealing work environment, making skipping between sql and r friction less.

Feels like I'm so close... Is there a way to work in sql using variables within r environment and then using that same sql within a r script with braces?

e.g. in a .Rmd file I can work within a sql chunk and pass variables using ?var_name. You answered my q already so I'll open a new post

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.