I am working with R. I am learning about how to optimize functions and estimate the maximum or minimum points of these functions.
For example, I created some random data ("train data):
#load libraries
library(dplyr)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
I also created the following function ("fitness") that takes seven inputs ( "random_1"
(between 80 and 120), "random_2"
(between "random_1" and 120) , "random_3"
(between 85 and 120), "random_4"
(between random_2 and 120), "split_1"
(between 0 and 1), "split_2"
(between 0 and 1), "split_3"
(between 0 and 1 )), , performs a series of data manipulation procedures and returns a "total" mean:
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {
#bin data according to random criteria
train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
split_1 = runif(1,0, 1)
split_2 = runif(1, 0, 1)
split_3 = runif(1, 0, 1)
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_1)))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_2)))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_3)))
#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
mean = mean(final_table$diff)
}
Just as a sanity check, we can verify that this function actually works:
#testing the function at some specific input:
a <- fitness(80,80,80,80,0.6,0.2,0.9)
a
[1] 0.899
Now, using the following reference on optimization with R (https://cran.r-project.org/web/packages/optimization/optimization.pdf and https://cran.r-project.org/web/packages/optimization/vignettes/vignette_master.pdf), I am trying to perform some common optimization techniques on this function.
For example:
#load library
library(optimization)
Nelder-Meade Optimization with an Initial Guess:
optim_nm(fitness, start = c(80,80,80,80,0,0,0))
Nelder-Meade Optimization with fixed parameters:
optim_nm(fun = fitness, k = 2)
Optimization using Simulated Annealing:
ro_sa <- optim_sa(fun = fitness,
start = c(runif(7, min = -1, max = 1)),
lower = c(80,80,80,80,0,0,0),
upper = c(120,120,120,120,1,1,1),
trace = TRUE,
control = list(t0 = 100,
nlimit = 550,
t_min = 0.1,
dyn_rf = FALSE,
rf = 1,
r = 0.7
)
)
But all of these procedures return a similar error:
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
And this is preventing me from visualizing the results of these optimization algorithms :
#code for visualizations
plot(ro_sa)
plot(ro_sa, type = "contour")
Can someone please show me what am I doing wrong? Is it possible to fix this?
Thanks