Bug: ggsave() does not work when called in mclapply() in RStudio IDE (same code works perfect at CLI)

ggplot2
rstudio
parallel

#1

I have a need to plot and save graphs from ggplot() to a file inside of the parallel version of lapply(), mclapply(). The following code snippet reproduces the problem. When run inside RStudio, no files are generated, no errors/warnings occur, 50% of the time mclapply() does not return, and the other 50% of the time it returns 5 NULL values. When run from the command line (Linux or macOS), it works 100% of the time, generates all the PNG files, and returns the values 0 through 4:

library(ggplot2)
library(parallel)

mclapply(
    0:4,
    function(n) {
        df <- data.frame(x = runif(10), y = runif(10))
        p  <- ggplot(df, aes(x, y)) + geom_point()

        ggsave(
            paste0('mclapply-', n, '.png'),
            plot   = p,
            device = 'png',
            width  = 4,
            height = 4
        )

        return(n)
    }
)

RStudio version is 1.1.419.

> R.version
               _
platform       x86_64-apple-darwin15.6.0
arch           x86_64
os             darwin15.6.0
system         x86_64, darwin15.6.0
status
major          3
minor          4.4
year           2018
month          03
day            15
svn rev        74408
language       R
version.string R version 3.4.4 (2018-03-15)
nickname       Someone to Lean On
> Sys.info()
                                                                                          sysname
                                                                                         "Darwin"
                                                                                          release
                                                                                         "17.5.0"
                                                                                          version
"Darwin Kernel Version 17.5.0: Mon Mar  5 22:24:32 PST 2018; root:xnu-4570.51.1~1/RELEASE_X86_64"
                                                                                         nodename
                                                                                   "farmac.local"
                                                                                          machine
                                                                                         "x86_64"
                                                                                            login

NOTE: the code works fine in RStudio if you change parallel::mclapply() to base::lapply() (it's just not parallelized), and I've also tried using png()/dev.off() in place of ggsave() and experience the same phenomenon.


#2

I can reproduce the "stall" in RStudio 1.1.453 on Linux with R 3.5.1.

A few comments:

  1. When the forked, child processes that parallel::mclapply() spawns off die / crash, the corresponding values will be NULL. This is a known behavior.

  2. The fact that you're observing NULL at random, suggests that the forked processes have crashed.

  3. Why did they crash? In ?parallel::mclapply, there a 'Warning' section that says "It is strongly discouraged to use these functions in GUI or embedded environments, because it leads to several processes sharing the same GUI which will likely cause chaos (and possibly crashes)." This could very well be the reason. I haven't investigate this, but I'd assume the RStudio GUI folks should be able to comment on whether the RStudio Console supports forked processing or not.

  4. That same 'Warning' section also says "Child processes should never use on-screen graphics devices." Note that with ggplot2 3.0.0, ggsave() actually opens an on-screen graphics device, e.g. https://twitter.com/ilarischeinin/status/1024643217315311616 and https://github.com/tidyverse/ggplot2/issues/2794. When I run your code in "terminal" R 3.5.1 on Linux with ggplot2 3.0.0, I get:

Error in (function (display = "", width, height, pointsize, gamma, bg,  : 
  a forked child should not open a graphics device

This can be avoided by wrapping ggsave(...) within R.devices::suppressGraphics() as the below example shows.

Now, instead of forked parallel processing, you can use background R session for parallelization using so called PSOCK clusters. Here's a simple way to do it:

library(ggplot2)

library(future.apply)
plan(multisession)  ## parallelize using background R sessions

y <- future_lapply(0:4, function(n) {
    df <- data.frame(x = runif(10), y = runif(10))
    p  <- ggplot(df, aes(x, y)) + geom_point()

    R.devices::suppressGraphics({
        ggsave(
            paste0('mmclapply-', n, '.png'),
            plot   = p,
            device = 'png',
            width  = 4,
            height = 4
        )
    })	    

    n
})

This works on any operating system.