Parallel R - Is there any way to take out an exclusive lock on a resource for use in non embarrassingly parallel problems

Is there any way to synchronise different threads when using R in parallel (such as through doParallel, future etc) by creating exclusive locks, similar to how Mutex.WaitOne works in c#? For embarrassingly parallel this is not necessary, but there are many scenerios where code can be run much faster by running it in parallel but occasionally requires exclusive access to a resource (database, file, output etc.)

I've given a toy example below. If run in a single thread it will fill a file with only unique numbers between 1 and 100. But if I try speeding it up by running it on multiple threads without some type of locking then multiple problems will occur. Two threads can read the list of existing numbers at the same time and as a result write the same number multiple times. And also multiple threads can write simultaneously to the file causing it to be corrupted.

library(doParallel)

noOfThreads <- 4 # Set this to 1 for single threading to see correct results
cluster <- makeCluster(noOfThreads)
registerDoParallel(cluster)

xs = integer()
writeLines(as.character(xs), "xs.txt")

foreach (Thread = 1:100, .export = "y") %dopar% {

    # Do some work

    # Need to create exclusive lock

    xs <- as.integer(readLines("xs.txt"))
    Sys.sleep(0.01)
    x <- sample(1:100, 1)
    if (!x %in% xs) {
      write(as.character(x), "xs.txt", append = T)
    }

    # Unlock resource
}

xs <- as.integer(readLines("xs.txt"))
print(sort(xs))
print(length(xs) == length(unique(xs)))

I don't know if it can be used efficiently with parallel computing but this package can help you with filelock

I found a package that contains exactly what I was looking for, "Rdsm". It only works on Linux, and not on Windows, but allows for sharing locks and variables between multiple threads (nodes on a cluster). I'd like to leave this question open, since I can't find an equivalent that works under Windows. But for Linux users Rdsm does the job.

1 Like

I did not know that one. I just put the link here for reference:
https://cran.r-project.org/web/packages/Rdsm/index.html