Using foreach in a given function

I am trying to use foreach in a given dataframe with a defined function.
Could anyone suggest me especially on lines 41-42 in the following code?
i.e.

  finalSum <- foreach(b=iter(batchSets, by='row'), .combine=rbind) %dopar% 
    fun_1(x=dat1$first,y=dat1$second,i=dat1$sl)

The code:

require(microbenchmark, quietly=TRUE)
require(doParallel, quietly=TRUE)
require(ggplot2, quietly=TRUE)
detectedCores <- parallel::detectCores()
registerDoParallel(cores=detectedCores - 1) 

#Given dataframe
set.seed(10)
dat1=data.frame(
  sl=1:10,
  first=sample(10),
  second=sample(10))

# Function to add a row in each row
# I need in this way by extracting a row at a time
fun_1 <- function(x,y,i){
  df_mont=dat1 %>% 
    filter(sl==i)
  df= df_mont %>% add_row(sl=i,first = x, second = y)
  df
}

# Example of running the function with mapply
output=mapply(function(x,y,i) fun_1(x,y,i),
       x=dat1$first,y=dat1$second,
       i=dat1$sl,
       SIMPLIFY = FALSE)
# Expected output
output1 <- do.call("rbind", output)


#Required with foreach
# I want to do it using foreach

parll<- function(x) {
  items <- nrow(x)
  batches <- detectedCores * 4
  
  batchSets <- split(x, rep(1:batches, length.out=items))
  
  finalSum <- foreach(b=iter(batchSets, by='row'), .combine=rbind) %dopar% 
    fun_1(x=dat1$first,y=dat1$second,i=dat1$sl)
  
  return (finalSum)
}
parll(x=dat1)







1 Like

I did it like this


parll<- function(x,myfunc) {
  items <- nrow(x)
  batches <- detectedCores * 4
  
  batchSets <- split(x, rep(1:batches, length.out=items))
  
  finalSum <- foreach(b=iter(batchSets, by='row'), 
                      .combine=rbind,
                      .packages = "tidyverse",
                      .export = "dat1") %dopar% 
    {myfunc(x=b$first,y=b$second,i=b$sl)}
  
  return (finalSum)
}
parll(x=dat1,fun_1)
1 Like

Thanks, @nirgrahamuk for your reply.
The code works perfectly with the given data when nrow(dat1) <batches.
It doesn't work when the condition is vice versa i.e.
I changed batches <- detectedCores * 1 instead of batches <- detectedCores * 4.
assuming, detectedCores =8.
Could you check this again?

Thanks again.

The issue here is related to your function design (i.e. fun_1) it is not vectorised for i, so can only process a single i at a time, setting the batches lower, means multiple i's in a batch , which your function fails to process as desired.

batchSets
$`1`
  sl first second
1  1     9      8
9  9     4     10

$`2`
   sl first second
2   2     7      7
10 10     1      5

$`3`
  sl first second
3  3     8      6

$`4`
  sl first second
4  4     6      9

$`5`
  sl first second
5  5     3      3

$`6`
  sl first second
6  6     2      2

$`7`
  sl first second
7  7    10      1

$`8`
  sl first second
8  8     5      4

as demonstration, here is a generic function that duplicates the rows of a dataframe, and can be distributed in parallel

fun_1 <- function(df_to_dup){
bind_rows(df_to_dup,
          df_to_dup)
}

parll<- function(x,myfunc) {
  items <- nrow(x)
  batches <- detectedCores * 1
  
  batchSets <- split(x, rep(1:batches, length.out=items))

  finalSum <- foreach(b=iter(batchSets, by='row'), 
                      .combine=rbind,
                      .packages = "tidyverse",
                      .export = "dat1") %dopar% {myfunc(b)}
  return (finalSum)
}
parll(x=dat1,fun_1)
2 Likes

This can be another way to get the result.
I was hoping to get it done using the following datasets.

  items <- nrow(x)
  batches <- detectedCores * 1
  
  batchSets <- split(x, rep(1:batches, length.out=items))

my example is one using that fun_1 (containing bind_rows) with the batches and batchsets how you have it at the end...
The point is that the function (fun_1) how its written, and how values are passed to it are in alignment.
using batchSets where you pass multiple rows to fun_1, needs to be accounted for

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.