Post-chunk hooks in RMarkdown

torvaney · October 24, 2019, 10:06am

I can fetch data from a Snowflake data warehouse really easily by writing my query in a SQL chunk. This is really nice and works seamlessly as it does for every other database.

However, Snowflake has a convention that column names are ALL_CAPS. So my query:

SELECT 1 AS thing

Returns this dataframe:

example

#>   THING
#>   <dbl>
#> 1     1

I can, of course, convert the column names to lowercase manually in R in the following chunk, with str_replace_all or something similar.

But this gets tedious quite quickly.

Is there anyway to do this automatically? I was thinking something like a function that automatically gets called on the object returned by the SQL chunk. For example:

In R

snowflake_to_r <- . %>% rename_all(str_replace_all)

SQL chunk header:

```{sql, connection = db, output.var = 'example', post_chunk = snowflake_to_r}

Does any similar functionality exist?

cderv · October 24, 2019, 5:58pm

You'll find those functions in knitr documentation. It is called hooks

I think chunk hook is what you are looking for

Chunk hooks are functions to be called before or after a code chunk when the chunk option is not NULL (it basically means as long as you set a option, the hook will be run)

or output hooks

Did you already try ?

I don't know if this work well with sql chunk or not...
you'll tell me.

jdblischak · October 24, 2019, 7:31pm

I put together an example that uses a custom chunk hook that automatically converts the column names of any data frame to lowercase. It uses an R chunk, but presumably if SQL chunks also return their results as data frames, this should also work. To use it on a chunk, set the chunk option lowercase to anything (knitr runs a hook connected to a chunk option only if that option is set for the chunk). The main downside is that it receives the entire environment of the chunk, so it will be redundant if you call it multiple times (i.e. it will convert the latest data frame created plus all previously converted data frames).

---
output: html_document
---

```{r setup, include=FALSE}
lowercase <- function(before, options, envir) {
  if (before) return()
  
  objects <- ls(envir = envir)
  for (obj in objects) {
    if (class(envir[[obj]]) == "data.frame") {
      colnames(envir[[obj]]) <- tolower(colnames(envir[[obj]]))
    }
  }
}
knitr::knit_hooks$set(lowercase = lowercase)
```

```{r example, lowercase=TRUE}
num <- 1:3
chr <- letters[1:3]
(x <- data.frame(ONE = num, TWO = chr))
```

```{r did-it-work}
x
```

cderv · October 25, 2019, 9:58pm

This example is nice but is tricky as it will modified any object from any chunk once activated because envir contains every object

Example with another dataframe using upper case

---
output: html_document
---

```{r setup, include=FALSE}
lowercase <- function(before, options, envir) {
  if (before) return()
  
  objects <- ls(envir = envir)
  for (obj in objects) {
    if (class(envir[[obj]]) == "data.frame") {
      colnames(envir[[obj]]) <- tolower(colnames(envir[[obj]]))
    }
  }
}
knitr::knit_hooks$set(lowercase = lowercase)
```

```{r}
y <- mtcars
colnames(y) <- toupper(colnames(y))
y
```

y is uppercase

```{r example, lowercase=TRUE}
num <- 1:3
chr <- letters[1:3]
(x <- data.frame(ONE = num, TWO = chr))
```

```{r did-it-work}
x
y
```

y is also modified

You can use the same idea but using the chunk option to pass the name of the object you want to modified

Providing name of object in chunk option

---
output: html_document
---

```{r setup, include=FALSE}
lowercase <- function(before, options, envir) {
  if (before) return()
  obj_name <- options$lowercase
  if (is.character(obj_name) && exists(obj_name, where = envir)) {
    colnames(envir[[obj_name]]) <- tolower(colnames(envir[[obj_name]]))
    return(invisible())
  }
}
knitr::knit_hooks$set(lowercase = lowercase)
```

```{r}
y <- mtcars
colnames(y) <- toupper(colnames(y))
y
```

y is uppercase

```{r example, lowercase="x"}
num <- 1:3
chr <- letters[1:3]
x <- data.frame(ONE = num, TWO = chr)
```

```{r did-it-work}
colnames(x)
colnames(y)
```

y is no more modified

Here is now an example for SQL chunk, using also the fact you can take advantage of output.var option too.

---
output: html_document
---

```{r setup, include=FALSE}
lowercase <- function(before, options, envir) {
  if (before) return()
  obj_name <- options$lowercase
  if (is.character(obj_name) && exists(obj_name, where = envir)) {
    colnames(envir[[obj_name]]) <- tolower(colnames(envir[[obj_name]]))
    return(invisible())
  }
}
knitr::knit_hooks$set(lowercase = lowercase)
```

```{r}
library(dplyr)
library(dbplyr)
library(RSQLite)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
colnames(mtcars) <- toupper(colnames(mtcars))
copy_to(con, mtcars)
DBI::dbListTables(con)
```

This is upper
```{r}
tbl(con, "mtcars") %>% colnames()
```

Select a column
```{sql, connection=con, output.var="MPG_up"}
SELECT MPG FROM MTCARS
```

Still upper
```{r}
colnames(MPG_up)
```

Add option to get lower colname
```{sql, connection=con, output.var="MPG_low", lowercase = "MPG_low"}
SELECT MPG FROM MTCARS
```

```{r}
colnames(MPG_low)
```

You can also use directly other options from the chunk

```{r}
lowercase <- function(before, options, envir) {
  if (before) return()
  if (options$lowercase && nzchar(options$output.var)) {
    obj_name <- options$output.var
    if (is.character(obj_name) && exists(obj_name, where = envir)) {
      colnames(envir[[obj_name]]) <- tolower(colnames(envir[[obj_name]]))
      return(invisible())
    }
  }
}
knitr::knit_hooks$set(lowercase = lowercase)
```

```{sql, connection=con, output.var="CYL_low", lowercase = TRUE}
SELECT CYL FROM MTCARS
```

```{r}
colnames(CYL_low)
```

Note all this will only work when rendering. This is because it is using knitr feature. Hooks won't do anythink when executing chunk in the Rmd inside RStudio.

Hope all this helps, and show the use of hooks to apply a function on the output df you get when using output.var.

system · November 15, 2019, 9:58pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.