I am working with the R programming language. I came across this link over here which shows how to "parallelize" your code : Running R Code in Parallel | R-bloggers
As far as I understand, "parallelize" means to strategically allocate your computer's resources in order to run your code faster.
For instance, I can run the code below on my computer, but it takes a while to run:
#Load library:
library(mopsocd)
#load libraries
library(dplyr)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,10)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
#define function:
funct_set <- function (x) {
#bin data according to random criteria
train_data <- train_data %>%
mutate(cat = ifelse(a1 <= x[1] & b1 <= x[3], "a",
ifelse(a1 <= x[2] & b1 <= x[4], "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = ifelse(c1 > x[5],1,0 )))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = ifelse(c1 > x[6],1,0 )))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = ifelse(c1 > x[7],1,0 )))
f1 = mean(table_a$quant)
f2 = mean(table_b$quant)
f3 = mean(table_c$quant)
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
f4 = mean(final_table$quant)
return (c(f1, f2, f3, f4));
}
gn <- function(x) {
g1 <- x[3] - x[1] >= 0.0
g2 <- x[4] - x[2] >= 0.0
g3 <- x[7] - x[6] >0
g4<- x[6] - x[5] >0
return(c(g1,g2,g3, g4))
}
## Set Arguments
varcount <- 7
fncount <- 4
lbound <- c(80,90,80,90,100, 200, 300)
ubound <- c(90,110,90,110,200, 300, 500)
optmin <- 0
#desired part to speed up
ex1 <- mopsocd(funct_set,gn, varcnt=varcount,fncnt=fncount,
lowerbound=lbound,upperbound=ubound,opt=optmin)
Suppose I want to "speed up" the last part of the above code:
#part to speed-up
ex1 <- mopsocd(funct_set,gn, varcnt=varcount,fncnt=fncount,
lowerbound=lbound,upperbound=ubound,opt=optmin)
Using the instructions from the website, you first need to see how many cores your computer has:
library(parallel)
detectCores()
[1] 8
cl <- makeCluster(8)
From here, you can now "parallelize" the code:
#parallelize code
results <- parSapply(cl , train_data , mopsocd(funct_set,gn, varcnt=varcount,fncnt=fncount,
lowerbound=lbound,upperbound=ubound,opt=optmin))
# close cluster object
stopCluster(cl)
Question : The "results" object is still running on my computer - can someone please tell me if I have "parallelized" my code correctly?
Thanks