require(microbenchmark, quietly=TRUE)
require(doParallel, quietly=TRUE)
require(ggplot2, quietly=TRUE)
detectedCores <- parallel::detectCores()
registerDoParallel(cores=detectedCores - 1)
#Given dataframe
set.seed(10)
dat1=data.frame(
sl=1:10,
first=sample(10),
second=sample(10))
# Function to add a row in each row
# I need in this way by extracting a row at a time
fun_1 <- function(x,y,i){
df_mont=dat1 %>%
filter(sl==i)
df= df_mont %>% add_row(sl=i,first = x, second = y)
df
}
# Example of running the function with mapply
output=mapply(function(x,y,i) fun_1(x,y,i),
x=dat1$first,y=dat1$second,
i=dat1$sl,
SIMPLIFY = FALSE)
# Expected output
output1 <- do.call("rbind", output)
#Required with foreach
# I want to do it using foreach
parll<- function(x) {
items <- nrow(x)
batches <- detectedCores * 4
batchSets <- split(x, rep(1:batches, length.out=items))
finalSum <- foreach(b=iter(batchSets, by='row'), .combine=rbind) %dopar%
fun_1(x=dat1$first,y=dat1$second,i=dat1$sl)
return (finalSum)
}
parll(x=dat1)
Thanks, @nirgrahamuk for your reply.
The code works perfectly with the given data when nrow(dat1) <batches.
It doesn't work when the condition is vice versa i.e.
I changed batches <- detectedCores * 1 instead of batches <- detectedCores * 4. assuming, detectedCores =8.
Could you check this again?
The issue here is related to your function design (i.e. fun_1) it is not vectorised for i, so can only process a single i at a time, setting the batches lower, means multiple i's in a batch , which your function fails to process as desired.
batchSets
$`1`
sl first second
1 1 9 8
9 9 4 10
$`2`
sl first second
2 2 7 7
10 10 1 5
$`3`
sl first second
3 3 8 6
$`4`
sl first second
4 4 6 9
$`5`
sl first second
5 5 3 3
$`6`
sl first second
6 6 2 2
$`7`
sl first second
7 7 10 1
$`8`
sl first second
8 8 5 4
as demonstration, here is a generic function that duplicates the rows of a dataframe, and can be distributed in parallel
my example is one using that fun_1 (containing bind_rows) with the batches and batchsets how you have it at the end...
The point is that the function (fun_1) how its written, and how values are passed to it are in alignment.
using batchSets where you pass multiple rows to fun_1, needs to be accounted for