I am trying to parallelize some CPU-intensive code (it doesn't need much memory) using the future and furrr packages, and not having much luck. The machine is an AWS/EC2 m4.16xlarge, which reports 64 cores, but regardless of whether I set it up to use 1, 8, 16, 24, or 32 workers, linux reports a total use of 2 cores (200% cpu use), with that being equally divided over the number of workers.
The behavior seems very consistent with difference pieces of codes that I have been trying that use future_map_* functions, regardless of what code is being called.
I am aware that future::plan(multicore(workers=...)) can cause issues within RStudio, but it doesn't really make a difference if I set it up as multisession, or if I create a cluster first with parallel and pass that to plan(). Nor does it make a difference if I start the code outside RStudio under Rscript.
It is running R 3.5.2 with future 1.17.0. I noticed that the OpenMP implementation on the server is old (3.1), I don't know whether there are dependencies on that. I noticed that the Rtsne packages, which is in principle OpenMP accelerated, doesn't seem to make effective use of the CPUs either.
Any ideas, suggestions as to what could be wrong and how to troubleshoot this?