Manage available memory in R when doing operations with large amount of big vectors. Avoid Error: vector memory exhausted (limit reached?)

When doing operations on a large amount of vectors (e.g., as part of the creation of a null distribution in a permutation test), I either get the error "vector memory exhausted (limit reached?)" or my RStudio session crashes. These problems occur with the code below.

I want to carry out operations on vectors including subtracting one vector from all other vectors as well as computing the dot product between one vector and all other vectors. On my computer (R version 3.6.2, 64-bit on MAC OS Catalina) I can do this with 1000 0000 vectors; so to get more I tried to split the process up by creating an outer for loop; so to get 2 millions I basically do it in a loop twice and then merge the final results in one column. However, this strategy didn't work.

How can one manage the memory resources better in this examlpe.

Any guidance is much appreciated. (I have tried to remove objects when not needed using rm() )

library(tibble)
set.seed(1)
#Example data
subtract <- runif(1000)
multiply <- runif(1000)
df_row <- runif(1000)
df <- as_tibble(matrix(sample(df_row), nrow=1000000, ncol = 1000))

# Time keeping
t1 <- Sys.time()
# list to store final results from for loop
outer_list <- list()
# for loop (here only looping twice; but could be increased to get larger distribution)
for(i_outer in 1:2){
  # List to store results from inner for loop where data is further split up in smaller more manageable chunks.
  random_split <- list(df, df)

  # Various operations on the lists
  inner_list <- list() #
  for(i in 1:length(random_split)) {

    dot_products_null <- random_split[i][1] %>%
      # Subtracting vector on all rows
      map(~ map2_df(.x, subtract, `-`)) %>%
      # Dot product for all rows
      map(~as.matrix(.x) %*% multiply)  %>%
      unlist() %>%
      as_tibble()

    inner_list[i] <- dot_products_null
    rm(dot_products_null)
    inner_list
  }

  inner_list <- as_tibble(unlist(inner_list))
  outer_list[i_outer] <- inner_list

  outer_list
}
outer_list

t2 <- Sys.time()
t2-t1

Hello,

I was able to run your code:

> t2 <- Sys.time()
> t2-t1
Time difference of 40.22113 secs

The only thing I changed was to avoid the rm(dot_products_null) as it is getting re-written anyways within your block of code. Similarly, I rid the call to inner_list and outer_list.

I am pretty sure your strategy is not working on your system because of random_split. This is a massive 14.9 GB monster. I am sure you are close to capping out ram or just capping it out while running this which is likely causing your client to crash. This is the very clear bottleneck in your code.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.