How many cores used when processing 2 R sessions?

Hi, sometimes I run two R projects ("sessions") simultaneously on my Mac. The Mac has 4 cores. And the question is that, in this case, do the two projects use the same core, or do they use different cores?

Each session will use one core unless you are using a parallel computing package. If your computer has two threads per core then you will have eight cores as far as R is concerned, and one of them will be used by each session. You can check the number of effective cores your computer has by loading the "parallel" package and using the command detectCores(). You can also check the proportion of the CPU usage devoted to each R session in the Mac activity monitor.

1 Like

Thank you for your reply. My computer has 8 cores. However, when there are 2 RStudio sessions, with one of them calculating something which is time-consuming, at the same time, I tried parallel::detectCores() in another session, but it still says "8 cores".

Functions like detectCores() tell you the number of "available" cores in that it's the number of cores you could send instructions to, it doesn't know whether these cores are busy.

To really understand, you need to know about operating systems (OS). There are three layers: first, the actual hardware available (e.g. you have 4 cores with 2 threads per cores, that's 8 virtual cores). Second, the operating system, e.g. MacOS, can directly interact with the hardware and send it instructions. Third, the userspace programs, including R, Safari, Finder, etc.

Userspace programs can't directly talk to hardware (that would be a security risk and cause many other problems). Instead, they tell the OS what they need, and the OS will pass along their requests to the hardware.

So, say you open Safari. Safari need to calculate things to decide what to display, so it will write up computations and send them to the OS to be executed. The OS will choose a CPU randomly and have it perform the calculations. Now, imagine this is year 2000 and you have a single CPU, and while Safari is displaying your webpage, you open Finder. Finder also needs computations done, but the core is busy working for Safari. One solution is to wait until Safari is done, then give core time to Finder. That's painful for you, the user, since you can't do two things at once. So instead, the OS will give 100 ms of core time to Safari, then quickly switch to Finder for 100 ms, then back to Safari, etc. From your point of view of user, it looks like both Safari and Finder can be run in parallel, even though you have a single core.

Now, let's come back to R: when you start an R session, it needs a single core. Your OS will thus choose one core arbitrarily, and have it do all the computations R asks for. But that core is not fully blocked, maybe from time to time the OS will have the same core execute instructions for another program. Maybe from time to time the OS will switch which core is actually doing R's computations.

You open a second R session, it will also start sending instructions to the CPU. It could technically use the same core as the first R session (switching rapidly between the two), but if you have other cores sitting around the OS will make sure to distribute the sessions between the available cores.

One consequence: on a computer with a single core, if you have one R session running a time-consuming computation, you can open a second session and do things, but the two sessions are actually competing for the same core, so that makes everything slow. On a computer with several cores, the time-consuming session ill pretty much keep one core busy, and the other sessions you open, or the other programs you run (e.g. Safari) will make use of the other cores, so the time-consuming computation doesn't need to pause, and nothing feels slow.

In any case, on a typical use of MacOS, you constantly have a bunch of programs that are running things, and the OS dispatches them across all cores. So when you ask detectCores(), it can only tell you how many cores are there, but the question of how busy they are is something only the OS knows.

If you start a parallel computation, you specify the number of processes that will send instructions to the OS. There is nothing preventing you from specifying more processes than cores available on your computer; but that's a bad idea, since they will end up competing and not being faster. So you typically take detectCores(), 8 in your case, and create that number of processes, so each of them can run on a different core without competing. In practice, you would even choose 7 or 6, so you can keep some cores free to run Safari or other R sessions, and not get the whole computer slowing down.

Note that using {parallel} or similar packages only makes sense if you can formulate your problem as a parallel one; if you just load the package and go on as usual there is no effect.

For your original question, if you don't use {parallel}, you don't fully know, it's the OS that decides, but if the OS is not completely stupid it will typically run each session on a single core (but since the R session might not be computing all the time, it might also run other computations on the same core from time to time).

As rye pointed out, the experimental approach to look in the activity monitor (and see what happens when you start big computations) is probably the most enlightening way.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.