Using `furrr::future_pmap` on Ubuntu code gets slower and slower

I am trying to use furrr::pmap to run a set of 216 models. The code runs fine on my 2015 macbook pro laptop (macos 10.15.7, 8gb of ram, 4 cores), and it works on my colleagues windows 10 pc. However, when I try to run it on my ubuntu machine (v 20.04, 64 gb ram, 32 cores) it gets progressively slower at running the models. Each model gets saved to disk, and I can see dramatic increases in the amount of time between file writes. See the attached screenshot - I killed the session around 9:30 pm when the last model output had been saved at 7:30

plan multicore vs multisession shows same behavior

I've observed the same behavior using both plan(multisession, workers = availableCores()) called from RStudio and using plan(multicore) and calling from an R session running in the terminal. I've also tried reducing the number of workers from 32 to 8 with no luck.

htop processor analysis

Examining htop it looks like something isn't working correctly with the cores. This screenshot was taken when I set plan (multisession, workers = 8), but you can see activity on all 32 cores. You can also see that in addition to the 8 main sessions of R there many other R sessions that are also occurring. I'm wondering if something is causing extra sessions to spin up and not close down that is bogging down the whole system over time.

memory

The data set that is being analyzed in the model is pretty small at just 2.9Mb. I don't see any indications that I'm running out of memory.

I'm a bit of a novice with linux computing and parallelization and would appreciate any suggestions of things to investigate.

Code

This is my invocation of future_pmap where multiverse_run is a custom function that runs the statistical model, saves the full output to disk and returns some summary statistics. multiverse_spec is a tibble that lists model parameters, and analysis_dat is the above mentioned 2.9Mb dataset. It takes ~1 minute to run per model.

multiverse_output <- multiverse_spec %>%
  future_pmap(., possibly(multiverse_run, otherwise = "error"), 
              data = analysisdat,
              .options = furrr_options(seed = TRUE),
              .progress = T)

The full code is available here, I haven't had luck creating a minimal reproducible example of the code. But, given that it works fine on mac and windows I don't think the modelling code is the problem.

I think the problem is with furrr creating additional r sessions that aren't closing down. I just looked at htop again several minutes after closing RStudio and it shows heavy activity on all 32 cores, with rsessions still running. Any suggestions?

I don't have a suggestion (sorry), but you might consider filing an issue in the furrr GitHub repo for this:

1 Like

furrr, or more precisely, the future framework has built-in protection against launching nested parallel workers, cf. A Future for R: Future Topologies • future. It would take an active effort and a hack to override that behavior.

However, what it cannot protect against is when you use futures to parallelize some code, and that code itself uses a non-future parallelization method to parallelize, e.g. hardcoded mclapply(..., mc.cores = detectCores()) or similar. This can also happen when there is C/C++/Fortran code that runs in so called multi-threaded code. So, if there's indeed nested parallelism going on, I suspect this types of reasons.

FWIW, the "red" load in your htop screenshot represents kernel load. In other words, the Linux kernel is really busy trying to catch up with very low-level tasks. This can for instance happens if there is a lot of disk I/O going on.

@HenrikBengtsson Thank you for the information about the future framework and htop. I appreciate you taking the time to look at this.

You might be interested to know when I dual booted the same hardware into windows 10 the code executed very quickly and without issue. There seems to be something specifically going on in Ubuntu. In both cases I was running identical code on fresh R (and Ubuntu/Windows) installs.

I put together some benchmarking code which you can see here if this is of further interest to you: https://github.com/williamlief/benchmarking . The upshot is that all of the synthetic control methods, except the very simplest, ran substantially slower on ubuntu than windows 10.

There are two ways to parallelise. Fork and socket.
Linux can support both, Windows only socket. My guess is that your custom function involves a function that had a mode where if it can fork then it does its own parallelization, but when it can't it doesn't. This would explain the apparent behaviour of nested parallelism in Linux but normal parallelism on Windows

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.