Free up memory before parallel (detach packages)

I have a long script that requires a lot of packages to prepare data before eventually running in parallel (forked on linux). I've been running into memory issues across all the parallel workers and have managed to improve this a lot by removing unnecessary objects and using gc() etc. However all of the loaded packages get passed into the memory of each worker, and the vast majority of them are unnecessary. Detaching packages doesn't actually seem to free up memory (it literally increases it in some cases??). I tried using .rs.restartR() in the script, but this has the effect of stopping the rest of the script from executing.

Is there any way to actually free up memory when detaching packages? Would it be best to use socket clusters and then pass objects / load packages that are necessary?

Update: Using a socket instead of forked cluster does the job. There's obviously a bit more upfront processing time to export objects into the worker environments, but this isn't really an issue given how long the overall job is. There is a huge memory savings though by only loading necessary packages: 1 GB less memory per worker. This is what I was looking for!

An approach I also use commonly for that type of problem: separate scripts. Have one script that does the preprocessing, and at the end saves the intermediate (I use {qs} for that, but rds and common formats work too), then a second script that starts afresh and only does the heavy parallel computation. And possibly a third script that does the final analyses.

Some advantages: if you do that a lot, the {targets} package can coordinate all these scripts in a natural way. Also, the design being more modular, as each of these scripts is saved on Github, when you just modify one script it's clear what part was changed.

Also, I just dislike changing the session state in the middle; with a separate script it's clear you're starting a new session, and it's explicit which objects are being imported from the previous step.

This topic was automatically closed 42 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.