Using Pmap instead of loops

Hi,

I am building a number of reports that require me to run SQL against our DB. The SQLs are a couple of hundred lines each and i was hoping instead of manually inputting the variables that i could use some SQL injection to create the SQLs, run them against the DB and collect the results. I am aware of glue which would solve this problem but i actually would like to try out pmap instead of my for loop as a way to learn it. Can anyone help me adjust my code below to make it work. I understand i am supposed to make a list of the variables before applying pmap but to be honest i came a bit unstuck

library(tidyverse)

# This is basically the SQL injection
clean_sql <- function (mysql, type = '99', min_width = '99', max_width = '99') {
  
  sqlcmd <- paste(mysql, collapse=" ")
  sqlcmd <- gsub('\t',replacement = ' ', sqlcmd)
  sqlcmd <- gsub("type", type, sqlcmd)
  sqlcmd <- gsub("min_width", min_width, sqlcmd)
  sqlcmd <- gsub("max_width", max_width, sqlcmd)
  sqlcmd
}

# First we build up a control table because we want to create many SQLs
iris_db <- iris
ctrl_table <- tibble(index = seq(1,3),
                     type = c('setosa', 'versicolor', 'versicolor'),
                     min_width = c(1.4, 1.1, 2.5),
                     max_width = c(3.4, 3.1, 3.5))


# The SQL is created here for demonstration but in reality its about 100 lines long
# It is also read in from the hard drive
strSQL <- " Select * FROM iris_db WHERE Species = 'type' AND Sepal.Width >= min_width 
AND Sepal.Width <= max_width"

sql_calc <- ''
# I want to create 3 SQLS based on the control table above where we inject the variables into the SQL
# Ideally i would like to know how to do this with purrr:pmap
for (i in ctrl_table$index) {
  sql_calc[i] <- clean_sql(strSQL,  
                         type = ctrl_table$type[i], 
                         min_width = ctrl_table$min_width[i],
                         max_width = ctrl_table$max_width[i])
}

# Use map to execute each of the queries and roll the results into a dataframe........

Hi, you can do it like this:

library(tidyverse)

# This is basically the SQL injection
clean_sql <- function (mysql, type = '99', min_width = '99', max_width = '99') {
  
  sqlcmd <- paste(mysql, collapse=" ")
  sqlcmd <- gsub('\t',replacement = ' ', sqlcmd)
  sqlcmd <- gsub("type", type, sqlcmd)
  sqlcmd <- gsub("min_width", min_width, sqlcmd)
  sqlcmd <- gsub("max_width", max_width, sqlcmd)
  sqlcmd
}

# First we build up a control table because we want to create many SQLs
iris_db <- iris
ctrl_table <- tibble(index = seq(1,3),
                     type = c('setosa', 'versicolor', 'versicolor'),
                     min_width = c(1.4, 1.1, 2.5),
                     max_width = c(3.4, 3.1, 3.5))


# The SQL is created here for demonstration but in reality its about 100 lines long
# It is also read in from the hard drive
strSQL <- " Select * FROM iris_db WHERE Species = 'type' AND Sepal.Width >= min_width AND Sepal.Width <= max_width"

sql_calc <- ''
# I want to create 3 SQLS based on the control table above where we inject the variables into the SQL
# Ideally i would like to know how to do this with purrr:pmap
for (i in ctrl_table$index) {
  sql_calc[i] <- clean_sql(strSQL,  
                           type = ctrl_table$type[i], 
                           min_width = ctrl_table$min_width[i],
                           max_width = ctrl_table$max_width[i])
}
ctrl_table %>%
  dplyr::mutate(sql = purrr::pmap_chr(list(type, min_width, max_width), clean_sql, mysql = strSQL))
#> # A tibble: 3 x 5
#>   index type     min_width max_width sql                                  
#>   <int> <chr>        <dbl>     <dbl> <chr>                                
#> 1     1 setosa         1.4       3.4 " Select * FROM iris_db WHERE Specie…
#> 2     2 versico…       1.1       3.1 " Select * FROM iris_db WHERE Specie…
#> 3     3 versico…       2.5       3.5 " Select * FROM iris_db WHERE Specie…

Created on 2018-08-21 by the reprex package (v0.2.0).

2 Likes

Hi @ mishabalyasin

Thank you very much for your quick answer. If i was to read in the SQL and then apply the clean names, something like below....Do you know how to put this within the Purrr framework for map

So for example, read_lines would read in the SQL from disk and then clean it afterwards

# Completely Fictional code
strSQL <- clean_sql(readr::read_lines("sql/mysql.SQL"))

If you have list of anything, you can use mapping functions.

In your case, you can get a list with list.files or something like that, and then use map_chr (since you'll only need to map through one list, not 3 lists at the same time as in your first example) to get SQL that you need.

It's also probably better to do it with glue, as you've mentioned, since it'll be more readable (I think), but it's up to you.

1 Like