Printing Individual Iterations Within a Loop

I have this dataset over here (e.g. students wrote an exam many times over a period of years and either pass or failed - I am interested in studying the effect of the previous test on the next test):

id = sample.int(10000, 100000, replace = TRUE)
res = c(1,0)
results = sample(res, 100000, replace = TRUE)
date_exam_taken = sample(seq(as.Date('1999/01/01'), as.Date('2020/01/01'), by="day"), 100000, replace = TRUE)


my_data = data.frame(id, results, date_exam_taken)
my_data <- my_data[order(my_data$id, my_data$date_exam_taken),]

my_data$general_id = 1:nrow(my_data)
my_data$exam_number = ave(my_data$general_id, my_data$id, FUN = seq_along)
my_data$general_id = NULL

     id results date_exam_taken exam_number
7992   1       1      2004-04-23           1
24837  1       0      2004-12-10           2
12331  1       1      2007-01-19           3
34396  1       0      2007-02-21           4
85250  1       0      2007-09-26           5
11254  1       1      2009-12-20           6

Next, I tried this code:

# does this mean I should set makeCluster() to makeCluster(8)???
 > detectCores()
[1] 8

my_list = list()
max = length(unique(my_data$id))

library(doParallel)
registerDoParallel(cl <- makeCluster(3))

# note: for some reason, this loop isn't printing?

test = foreach(i = 1:max, .combine = "rbind") %dopar% {

    {tryCatch({
        
        start_i = my_data[my_data$id == i,]
        
        pairs_i =  data.frame(first = head(start_i$results, -1), second = tail(start_i$results, -1))
        frame_i =  as.data.frame(table(pairs_i))
        frame_i$id = i
        print(frame_i)
        my_list[[i]] = frame_i
    }, error = function(e){})
    }}

The code seems to be running, but nothing is printing - can anyone please show me what I am doing wrong and what I can do to fix this?

Thanks!

To see each individual output printed in the console upon execution, remove .combine = "rbind" and do not assign it to test.

foreach(i = 1:2) %dopar% {
  
  {tryCatch({
    
    start_i = my_data[my_data$id == i,]
    
    pairs_i =  data.frame(first = head(start_i$results, -1), second = tail(start_i$results, -1))
    frame_i =  as.data.frame(table(pairs_i))
    frame_i$id = i
    print(frame_i)
    my_list[[i]] = frame_i
  }, error = function(e){})
  }}
#> [[1]]
#>   first second Freq id
#> 1     0      0    3  1
#> 2     1      0    2  1
#> 3     0      1    3  1
#> 4     1      1    0  1
#> 
#> [[2]]
#>   first second Freq id
#> 1     0      0    1  2
#> 2     1      0    3  2
#> 3     0      1    4  2
#> 4     1      1    1  2
1 Like

@ Scottyd22: Thank you so much for your answer! If I have understood this correctly, the code you have written will still "populate" the list object AND print the results from the loop. Correct?

Thank you so much!

You're welcome! Yes, but the previous code was not done in parallel (since I was only illustrating two iterations).

Upon further investigation, I could not get the code to work in parallel. However, if you switch %dopar% to %do%, it will not execute in parallel, but the list object will be populated and the results printed.

I assume the request is not about actually seeing the frames fly by, but more about tracking progress overall.
if so can adapt the proposal here : How to include progressbar with doParallel (previously done only with the doSNOW-package) and foreach loop. Note, that you can easily wrap this into a function. · GitHub
like so:

id = sample.int(10000, 100000, replace = TRUE)
res = c(1,0)
results = sample(res, 100000, replace = TRUE)
date_exam_taken = sample(seq(as.Date('1999/01/01'), as.Date('2020/01/01'), by="day"), 100000, replace = TRUE)

my_data <- data.frame(id,res,results,date_exam_taken)
my_data <- my_data[order(my_data$id, my_data$date_exam_taken),]

my_data$general_id = 1:nrow(my_data)
my_data$exam_number = ave(my_data$general_id, my_data$id, FUN = seq_along)
my_data$general_id = NULL


my_list = list()
max = length(unique(my_data$id))

library(doParallel)
library(tidyverse)
library(glue)
registerDoParallel(cl <- makeCluster(3))

# note: for some reason, this loop isn't printing?

# Progress combine function
f <- function(iterator){
  pb <- txtProgressBar(min = 1, max = iterator - 1, style = 3)
  count <- 0
  function(...) {
    count <<- count + length(list(...)) - 1
    setTxtProgressBar(pb, count)
    flush.console()
    bind_rows(...) # this can feed into .combine option of foreach
  }
}

my_list = list()
test = foreach(i = 1:max, .combine = f(max)) %dopar% {
  
  {tryCatch({
    
    start_i = my_data[my_data$id == i,]
    
    pairs_i =  data.frame(first = head(start_i$results, -1), second = tail(start_i$results, -1))
    if(nrow(pairs_i)==0)
      stop(glue("insufficient data at i: {i}"))
    frame_i =  as.data.frame(table(pairs_i))
    frame_i$id = i
    print(frame_i)
    my_list[[i]] = frame_i
  }, error = function(e){})
  }}
1 Like

@ nigrahamuk: Thank you so much for your answer!

Is this line of code necessary? Or can I skip it?

if(nrow(pairs_i)==0)
      stop(glue("insufficient data at i: {i}"))

This code produces a progress bar - its still not possible to print the number of iterations as the loop continues, no?

Thank you so much!

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.