callr seems to hang when doing passing through reticulate installation steps

OK so this is a bit of a tricky one to debug, since I cannot use reprex because it ends up with a process hanging.

I'm writing some installation helpers with callr in the greta R package. We wanted to use callr so we could take advantage of when installations time out, and there's some nice way to capture errors/warnings/etc that happen when using Python, while not crowding the console too much.

I wrote some code to help spin up new installation processes, with custom start and end messages - the one that keeps breaking is when I am installing miniconda from scratch.

Unfortunately for this error to be reproducible, I need to remove the miniconda directory - I acknowledge that perhaps I'm doing something wrong here! I'm also wondering if perhaps

When I debug this process, I end up being stuck at the r_callr_process$wait(timeout = timeout_minutes) part.

I'm wondering if perhaps I've made some mistake creating the new_install_process function? I can run the relevant R code that these installation helpers run, and it works just fine in the end!

Here is the relevant code below from greta (original code from greta is in new_install_process, and greta_install_miniconda.

library(callr)
library(cli)

new_install_process <- function(callr_process,
                                timeout,
                                cli_start_msg = NULL,
                                cli_end_msg = NULL){
  options(warning.length = 2000)
  cli::cli_process_start(cli_start_msg)
  # convert max timeout from milliseconds into minutes
  timeout_minutes <- timeout * 1000 * 60
  r_callr_process <- callr::r_process$new(callr_process)
  r_callr_process$wait(timeout = timeout_minutes)
  
  status <- r_callr_process$get_exit_status()
  output_notes <- r_callr_process$read_output()
  no_output <- nchar(output_notes) == 0
  output_error <- r_callr_process$read_all_error_lines()
  
  if (is.null(status)) {
    cli::cli_process_failed()
    stop(
      # timeout_install_msg(timeout, output_error),
      "there was an installation issue",
      call. = FALSE
    )
  } else if (no_output) {
    cli::cli_process_failed()
    stop(
      # other_install_fail_msg(output_error),
      "there was an installation issue",
      call. = FALSE
    )
  }
  
  cli_process_done(msg_done = cli_end_msg)
  
  return(
    list(output_notes = output_notes,
         status = status,
         no_output = no_output,
         output_error = output_error)
  )
  
}

greta_install_miniconda <- function(timeout) {
  
  callr_install_miniconda <- callr::r_process_options(
    func = function() {
      reticulate::install_miniconda()
    }
  )
  
  # if this function doesn't fail, then this code here can be run?
  install_miniconda_process <- new_install_process(
    callr_process = callr_install_miniconda,
    timeout = timeout,
    cli_start_msg = "No {.pkg miniconda} detected, installing \\
                      {.pkg miniconda}, this may take a minute.",
    cli_end_msg = "{.pkg miniconda} installed!"
  )
  miniconda_notes <- install_miniconda_process$output_notes
  cli::cli_ul("To see full installation notes run:")
  cli::cli_ul("{.code miniconda_notes}")
  return(invisible(miniconda_notes))
}

greta_install_miniconda(timeout = 1)

Created on 2021-11-01 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       macOS Big Sur 10.16         
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_AU.UTF-8                 
#>  ctype    en_AU.UTF-8                 
#>  tz       Australia/Perth             
#>  date     2021-11-01                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
#>  cli           3.1.0   2021-10-27 [1] CRAN (R 4.1.1)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
#>  digest        0.6.28  2021-09-23 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.36    2021-09-29 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.1.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.7.5   2021-10-04 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2   2021-09-23 [1] CRAN (R 4.1.0)
#>  tibble        3.1.5   2021-09-30 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.26    2021-09-14 [1] CRAN (R 4.1.0)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library

I believe this is because the standard output and/or error buffers get full and you need to read them out. Or, if you don't need them while the installation is running, you can redirect them to a file, or two files.

See also this: https://github.com/r-lib/processx#standard-output-and-error, especially this warning:

Always make sure that you read out the standard output and/or error of the pipes, otherwise the background process will stop running!

Thanks so much, Gabor!

So if I understand correctly, something like adding the stderr and stdout arguments like so should help?

greta_install_miniconda <- function(timeout) {
  
  callr_install_miniconda <- callr::r_process_options(
    func = function() {
      reticulate::install_miniconda()
    },
    stdout = "|",
    stderr = "|"
  )
  
  # if this function doesn't fail, then this code here can be run?
  install_miniconda_process <- new_install_process(
    callr_process = callr_install_miniconda,
    timeout = timeout,
    cli_start_msg = "No {.pkg miniconda} detected, installing \\
                      {.pkg miniconda}, this may take a minute.",
    cli_end_msg = "{.pkg miniconda} installed!"
  )
  miniconda_notes <- install_miniconda_process$output_notes
  cli::cli_ul("To see full installation notes run:")
  cli::cli_ul("{.code miniconda_notes}")
  return(invisible(miniconda_notes))
}

greta_install_miniconda(timeout = 1)

Adding stdout and stderr could help, but not | because that is exactly the default already, the pipe that has a finite buffer and if the buffer gets full then the background process stops until the foreground process reads something from the pipe.

This is equivalent to writing something like

producer | consumer

in the Unix shell. There is a finite buffer between the two processes, and when it gets full, producer stops producing until consumer consumes something. This is needed for "back-pressure", to ensure that you can connect a slow consumer to a fast producer.

It is the same with callr/processx background processes. If the background process is producing output faster than the main process can consume it, then the background process stops, until you call proc$read_output() etc.

So if you don't want to actively $poll() and $read_* the background process, then redirect its standard output and error to files:

... stdout = tempfile("out-"), stderr = ">&1", ...

The >&1 notation will put stderr in the same file as stdout.