I am using the R programming language. I am trying to learn how to add "progress bars" to estimate how much time is remaining while a function is running (progress_bar function - RDocumentation).
For example:
library(progress)
pb <- progress_bar$new(total = 100)
for (i in 1:100) {
pb$tick()
Sys.sleep(1 / 100)
}
Suppose I have a function called "grid_function" and a dataset called "DF_1". I am taking each individual row from "DF_1" and feeding this row into "grid_function". "grid_function" performs some calculations using this row, and stores it into a "list" called "resultdf1". Finally, "resultdf1" is converted into a data frame called "final_output". The "feeding process" can be seen below:
resultdf1 <- apply(DF_1,1, # 1 means rows
FUN=function(x){
do.call(
# Call Function grid_function with the arguments in
# a list
grid_function,
# force list type for the arguments
c(list(train_data_new), as.list(
# make the row to a named vector
unlist(x)
)
))
}
)
l = resultdf1
final_output = rbindlist(l, fill = TRUE)
Question: I would like to add a "progress bar" to the above code.
What I tried: I tried to do this as follows:
library(doParallel)
library(future)
#note: I think "makePSOCKcluster" is making my code run faster, but I am not sure - I am open to suggestions!
cl <- makePSOCKcluster(6) # 6 cpu cores out of 8
registerDoParallel(cl)
pb <- progress_bar$new(total = 100)
for (i in 1:100) {
resultdf1 <- apply(DF_1,1, # 1 means rows
FUN=function(x){
do.call(
# Call Function grid_function2 with the arguments in
# a list
grid_function,
# force list type for the arguments
c(list(train_data_new), as.list(
# make the row to a named vector
unlist(x)
)
))
}
)
l = resultdf1
final_output = rbindlist(l, fill = TRUE)
pb$tick()
Sys.sleep(1 / 100)
}
stopCluster(cl)
This appears to be working, but I am not sure if I did everything correctly. Can someone please tell me if I have done this correctly? Is there any chance that adding this "progress bar" will actually result in the function taking more time to run?
Thanks