Help optimize nested for loops used for subsetting

katgr · May 27, 2020, 6:33pm

Hi,

As part of a larger study I'm doing a simulation of decisions in regards to subsetting of data. Currently I'm using nested for loops as shown in the example below.

However, my full code has over 1 million iterations and I am therefore trying to optimize it as much as possible to reduce execution time.

I have tried to optimize the code to the best of my knowledge and I changed to data.table and saw a small speed increase.

Some of the iterations will inevitably result in empty dataframes. I have tried to use if/else/next to stop the current iteration if the dataframe has nrow == 0 but it resulted in a marked increase in running time.
Is there any way I can optimize my code to decrease the running time?
Does it make any sense to parallelize it using foreach when the task for each iteration is so small?

library(data.table)
library(tidyverse)


my_df <- data.table(id = c("id1", "id1", "id1", "id2", "id2"),
           bin_year = c(1,1,1,2,2),
           outcome = c("outcome1", "outcome1", "outcome2", "outcome2", "outcome3"),
           bin_interv = c(1, 2, 3, 1, 2)
            )

unq_outcome <- unique(my_df$outcome)

loop_output <- list()
for (l in 1:max(my_df$bin_year)) {
    for (o in 1:((max(my_df$bin_interv)) + 3)) {
      for (p in 1:((n_distinct(unq_outcome)) + 1)) {
        
        # iterations
        iteration <- str_c(l,o,p)
        
        # selectors
        select_year <- 1:l
        select_interv <- if (o <= max(my_df$bin_interv)) {o} else 
                         if (o == max(my_df$bin_interv) + 1 ) {c(2,4)} else 
                         if (o == max(my_df$bin_interv) + 2 ) {c(1,5)} else {1:max(my_df$bin_interv)}
        select_outcome <- if (p <= n_distinct(unq_outcome)) {unq_outcome[p]} else {unq_outcome}
        
        # subset data
        loop_output[[iteration]] <- my_df[bin_year %in% select_year & 
                                          bin_interv %in% select_interv & 
                                          outcome %in% select_outcome]
      }}}

system · June 17, 2020, 6:33pm

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.