Background Job Failing Despite Script Working in Console

I am trying to run a background job and it keeps failing but when I run the same script it works in the console. Does anyone have tips for troubleshooting this issue?

My script is roughly set up this way:

# Load libraries
library(glue)
library(tidyverse)
library(ncdf4)
library(multidplyr)

# Load functions written by our team
source("~/repos/inequality/R_scripts/load_utils.R")

# set universal inputs
DB = "/some/file/path/to/input" 
BASE = "folder_with_data" 
OUT="/some/file/path/to/output" 

# generate all scenarios I want to run
scenario_to_run = expand_grid(var_1 = c(1,2,3),
                              var_2 = c('a','b','c'),
                              var_3 = c(TRUE, FALSE))

# run function across all scenarios and produce log
run_log = pmap(scenario_to_run, safely(deciles_plot)) %>%
  transpose()

The error I see in my run_log from the background job is object 'BASE' not found but I can also see that the objects DB, BASE, OUT are all saved as values in the background job's environment. With in the deciles_plot function DB, BASE, and OUT are the default input for some of the variables. DB and BASE specifically are used to generate another directory path that is fed to a separate function. If the issue was background jobs getting confused by having an object as a default value for an input into a function then why doesn't it fail on DB which is called before BASE?

deciles_plot = function(
    var_1,
    var_2,
    var_3,
    input = DB,
    sector_basename = BASE,
    output = OUT){
        dir <- glue('{input}/{sector_basename}')
        data = pull_raw_data_function(dir = dir, ...)
        ...}

As I mentioned before if I clear my environment and then run this script in the console it works just find so it's clearly something about background jobs that is causing an issue. The issue clearly isn't with generating and saving the BASE object since I can see it in the background job's environment. And it seems inconsistent about using these objects as input because it doesn't fail when attempting to use DB but does fail when attempting to use BASE.

Also I know I could potentially solve this issue by not having BASE be the default for sector_basename and just inputting the values assigned to BASE directly into my function. But this is just the way my team writes and uses functions and I want to stay consistent. Especially since the inconsistent behavior of background jobs implies there is a bug somewhere.

If memory serves, background jobs create a private, empty environment and execute in that environment. This allows RStudio to know exactly what data was created by the job, since any new variables in the environment must have been written by the job.

I'm not sure what's causing the behavior you're seeing, but my hunch would be that somewhere the function's getting hooked up to the global environment instead, and thus it cannot see the the BASE variable which only exists in the private environment.

Next steps you might try for debugging:

  • call deciles_plot directly instead of via safely (safely could be the instrument that is changing the parent environment to the global environment)
  • add logging statements to deciles_plot to emit the current environment context and its contents

Would it have something to do with this issue on their Github? It looks like they just fixed it, and it should be in an upcoming release of RStudio, if its not already in the latest release.

I've been looking everywhere trying to find a solution for this issue as well, glad to see I'm not the only one!

Yes, very likely what you're seeing is the same issue! You can try the fix in the 2023.05 version if you like:

https://dailies.rstudio.com/

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.