Error with {callr} not doing NSE the way {disk.frame} does causing issue with knitting in {Rmarkdown}

xiaodai · December 15, 2019, 10:53pm

Actually, I am not 100% sure what the issue might be. But I think it's to do with how {callr} does the NSE and in particular how it fails to capture vital global variables.

Below is the code I am getting this error

However, I don't get this error when I start a fresh session and just run this. I can't seem to figure out what the issue is except my initial hypothesis.

Below the code. Side question how do I format the code properly? Wrapping it in doesn't work as rmarkdown already contains some .

---
title: "Test"
output: rmarkdown::html_vignette
---

``` {r setup, include = FALSE}
remotes::install_github("xiaodaigh/disk.frame", ref="development")
suppressPackageStartupMessages(library(disk.frame))
library(fst)
library(magrittr)
library(nycflights13)
library(dplyr)
library(data.table)

# you need to run this for multi-worker support
# limit to 2 cores if not running interactively; most likely on CRAN
# set-up disk.frame to use multiple workers
if(interactive()) {
  setup_disk.frame()
  # highly recommended, however it is pun into interactive() for CRAN because
  # change user options are not allowed on CRAN
  options(future.globals.maxSize = Inf)  
} else {
  setup_disk.frame(2)
}


knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r asdiskframe, cache=TRUE}
library(nycflights13)
library(dplyr)
library(disk.frame)
library(data.table)

# convert the flights data to a disk.frame and store the disk.frame in the folder
# "tmp_flights" and overwrite any content if needed
flights.df <- as.disk.frame(
  flights, 
  outdir = file.path(tempdir(), "tmp_flights.df"),
  overwrite = TRUE)

flights.df
```

```{r, dependson='asdiskframe'}
library(disk.frame)
flights.df %>%
  group_by(carrier) %>% # notice that hard_group_by needs to be set
  summarize(count = n(), mean_dep_delay = mean(dep_delay, na.rm=T)) %>%  # mean follows normal R rules
  collect %>% 
  arrange(carrier)
```

cderv · December 16, 2019, 8:17am

Just for reference this issue is also on SO and initially on Github

cderv · December 16, 2019, 10:17am

Let's not that it works when I use rmarkdown::render. So it seems to be related on how RStudio IDE is rendering when one click on Knit.

Also, running inside callr create indeed an error

callr::r(function() rmarkdown::render("test.rmd"))

Not sure what happen here, or how to debug...

Here is the stack trace

Résumé

15. (function ()  ...
 16. rmarkdown::render("test.rmd", envir = globalenv())
    ??:1:10
 17. knitr::knit(knit_input, knit_output, envir = envir, quiet = quiet)
 18. knitr:::process_file(text, output)
 19. base:::withCallingHandlers(if (tangle) process_tangle(group) else process_group( ...
 20. knitr:::process_group(group)
 21. knitr:::process_group.block(group)
 22. knitr:::call_block(x)
 23. knitr:::block_exec(params)
 24. knitr:::in_dir(input_dir(), evaluate(code, envir = env, new_device = FALSE,  ...
 25. knitr:::evaluate(code, envir = env, new_device = FALSE, keep_warning = !isFALSE( ...
 26. evaluate::evaluate(...)
 27. evaluate:::evaluate_call(expr, parsed$src[[i]], envir = envir,  ...
 28. evaluate:::timing_fn(handle(ev <- withCallingHandlers(withVisible(eval(expr,  ...
 29. base:::handle(ev <- withCallingHandlers(withVisible(eval(expr,  ...
 30. base:::withCallingHandlers(withVisible(eval(expr, envir, enclos)),  ...
 31. base:::withVisible(eval(expr, envir, enclos))
 32. base:::eval(expr, envir, enclos)
 33. base:::eval(expr, envir, enclos)
 34. magrittr:::`%>%`(flights.df %>% group_by(carrier) %>% summarize(mean_dep_delay = ...
 35. base:::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 36. base:::eval(quote(`_fseq`(`_lhs`)), env, env)
 37. base:::eval(quote(`_fseq`(`_lhs`)), env, env)
 38. `_fseq`(`_lhs`)
 39. magrittr:::freduce(value, `_function_list`)
 40. function_list[[i]](value)
 41. dplyr:::summarize(., mean_dep_delay = mean(dep_delay, na.rm = T))
 42. disk.frame:::summarise.grouped_disk.frame(., mean_dep_delay = mean(dep_delay,  ...
 43. disk.frame:::generate_summ_code(...)
 44. purrr::map_dfr(code, ~{ ...
 45. purrr:::map(.x, .f, ...)
 46. disk.frame:::.f(.x[[i]], ...)
 47. magrittr:::`%>%`(gpd %>% filter(token == "SYMBOL_FUNCTION_CALL") %>%  ...
 48. base:::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 49. base:::eval(quote(`_fseq`(`_lhs`)), env, env)
 50. base:::eval(quote(`_fseq`(`_lhs`)), env, env)
 51. disk.frame:::`_fseq`(`_lhs`)
 52. magrittr:::freduce(value, `_function_list`)
 53. function_list[[i]](value)
 54. dplyr:::filter(., token == "SYMBOL_FUNCTION_CALL")
 55. dplyr:::filter.default(., token == "SYMBOL_FUNCTION_CALL")
 56. dplyr:::filter_(.data, .dots = compat_as_lazy_dots(...))
 57. base:::.handleSimpleError(function (e)  ...
 58. h(simpleError(msg, call))

It seems the error is thrown from here

github.com

DiskFrame/disk.frame/blob/dd2cd5fd3f5d24aaebb0375976ff6b0174517f26/R/one-stage-verbs.R#L293


      
          
          list_of_chunk_agg_fns <- as.character(utils::methods(class = "chunk_agg.disk.frame"))
          list_of_collected_agg_fns <- as.character(utils::methods(class = "collected_agg.disk.frame"))
          
          # generate the chunk_summarize_code
          summarize_code = purrr::map_dfr(code, ~{
            
            expr_id <<- expr_id  + 1
            # parse the function into table form for easy interrogration
            gpd = getParseData(parse(text = deparse(.x)), includeText = TRUE); 
            grp_funcs = gpd %>% filter(token == "SYMBOL_FUNCTION_CALL") %>% select(text) %>% pull
            grp_funcs = grp_funcs %>% paste0("_df")
            
            # search in the space to find functions name `fn`.chunk_agg.disk.frame
            # only allow one such functions for now TODO improve it
            num_of_chunk_functions = sum(sapply(unique(grp_funcs), function(x) exists(paste0(x, ".chunk_agg.disk.frame"))))
            num_of_collected_functions= sum(sapply(unique(grp_funcs), function(x) exists(paste0(x, ".collected_agg.disk.frame"))))
            
            # the number chunk and aggregation functions must match
            stopifnot(num_of_chunk_functions == num_of_collected_functions)

where gpd may be NULL from the error message about filter_. It seems getParseData returns NULL here...

As it works interactively, it is not easy to debug this part, and see what gpd really is.

I tried this to debug

session <- callr::r_session$new()
session$run(function() rmarkdown::render("test.rmd", envir = globalenv()))
# error so enter debug mode
session$debug()

And in debug mode

# Go into generate_summ_code frame
.inspect 28 
# See what gpd is
code = substitute(list(...))[-1]
getParseData(parse(text = deparse(code[[1]])), includeText = TRUE)

It returns NULL...

To close debug mode, ESC then

session$close()

The issue seems to be with getParseData returning NULL as not founding what it is looking for.

Is this helping you ?

xiaodai · December 17, 2019, 1:07am

Thank you very much. I suspected that's where the error was, but I can't figure why there would be an error except that callr was causing it.

Good to know I can just use render() directly

system · January 7, 2020, 1:07am

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.