This is a bit of a puzzle. The reason the gc wouldn't be freeing memory is if it's not running or there's lingering reference to the object.
Allocate static amount of memory for testing
Using vector to assign chunks of memory might be more efficient for testing. At least it makes it easier for me to sort out how much memory I can expect to be used.
library(pryr)
foo <- vector(length=250000)
object_size(foo) #1 MB
Report pre and post memory use
I modified the script:
- allocate a static amount per iteration 100Mb per for 10 iterations.
- run gc() once after the loop
- emit pre loop and post gc memory use
- output to text file in /tmp
Modified testing script
library(parallel)
noOfThreads <- detectCores() - 1
cluster <- makeCluster(noOfThreads, type = "FORK",
outfile="/tmp/forkedmem.txt")
# Worker function to be run on by each forked process
workfun <- function(node_num){
bigList <- list()
index <- 1
# Pre allocation memory reporting
memFree <- as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo", intern=TRUE))
cat(sprintf("%i: pre memFree: %i\n", node_num, memFree))
while(index < 11) {
# Allocate ~100 MB of logical vector
bigList[[index]] <- vector(length=25000000)
# Gather & emit iteration's memory use
memFree <- as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo", intern=TRUE))
cat(sprintf("%i: %i memFree: %i\n", node_num, index, memFree))
index <- index + 1
}
# Remove objects in environment and garbage clean,
# but keep the node number!
rm(list = setdiff(ls(), "node_num"))
gc(verbose=F)
# Post GC report
memFree <- as.numeric(system("awk '/MemFree/ {print $2}' /proc/meminfo", intern=TRUE))
cat(sprintf("%i: end memFree: %i\n", node_num, memFree))
}
# Run the function on the forks
clusterApplyLB(cluster, 1:noOfThreads, workfun)
Results: (node_num: iteration mem free)
2: pre memFree: 6931168
3: pre memFree: 6930664
1: pre memFree: 6930664
3: 1 memFree: 6629988
2: 1 memFree: 6534276
1: 1 memFree: 6532512
...
...
3: 10 memFree: 3911460
2: 10 memFree: 3911836
1: 10 memFree: 3913236
3: end memFree: 5704496
2: end memFree: 6849472
1: end memFree: 6849532
Looks like gc does run.
I'm seeing just over 80MB remaining after the call to gc() in any of the threads. Each fork should be allocating ~ 1000 MB of space during it's run.