# Defining Fitness Functions in R

I am following this tutorial over here on optimization: A quick tour of GA

``````#load library
library(GA)

#define function
Rastrigin <- function(x1, x2)
{
20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
}

x1 <- x2 <- seq(-5.12, 5.12, by = 0.1)
f <- outer(x1, x2, Rastrigin)
#plot function
persp3D(x1, x2, f, theta = 50, phi = 20, col.palette = bl2gr.colors)

#plot contours
filled.contour(x1, x2, f, color.palette = bl2gr.colors)

#run optimization algorithm
GA <- ga(type = "real-valued",
fitness =  function(x) -Rastrigin(x, x),
lower = c(-5.12, -5.12), upper = c(5.12, 5.12),
popSize = 50, maxiter = 1000, run = 100)

#plot results
plot(GA)
`````` ``````> summary(GA)
-- Genetic Algorithm -------------------

GA settings:
Type                  =  real-valued
Population size       =  50
Number of generations =  1000
Elitism               =  2
Crossover probability =  0.8
Mutation probability  =  0.1
Search domain =
x1    x2
lower -5.12 -5.12
upper  5.12  5.12

GA results:
Iterations             = 208
Fitness function value = -1.395134e-06
Solution =
x1           x2
[1,] 5.41751e-05 6.400989e-05
``````

Now, I am trying to apply the above algorithm to a new problem:

Here is the data I am using:

``````#load library
library(dplyr)

library(data.table)

set.seed(123)

# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
``````

Problem Statement

Using the following code:

1. I am trying to find seven numbers ("random_1", "random_2", "random_3", "random_4", "split_1", "split_2", "split_3") which produce the biggest value of another variable called "total" (defined in the code below).
2. I am not sure if this is possible, but I would like to find the smallest set of ("random_1", "random_2", "random_3" and "random_4") that produce the biggest value of "total".

Below, I have written a loop that tries to solve this problem using the "Random Search" algorithm (i.e. generate many random sets of ("random_1", "random_2", "random_3", "random_4", "split_1", "split_2", "split_3") and see which one of these sets produces the biggest value of "total":

Code for Random Search

``````results_table <- data.frame()

for (i in 1:10 ) {

#generate random numbers
random_1 =  runif(1, 80, 120)
random_2 =  runif(1, random_1, 120)
random_3 =  runif(1, 85, 120)
random_4 =  runif(1, random_3, 120)

#bin data according to random criteria
train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))

train_data\$cat = as.factor(train_data\$cat)

#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)

b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)

c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)

split_1 =  runif(1,0, 1)
split_2 =  runif(1, 0, 1)
split_3 =  runif(1, 0, 1)

#calculate 60th quantile ("quant") for each bin

table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_1)))

table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_2)))

table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_3)))

#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a\$diff = ifelse(table_a\$quant > table_a\$c1,1,0)
table_b\$diff = ifelse(table_b\$quant > table_b\$c1,1,0)
table_c\$diff = ifelse(table_c\$quant > table_c\$c1,1,0)

#group all tables

final_table = rbind(table_a, table_b, table_c)

#create a table: for each bin, calculate the average of "diff"
final_table_2 = data.frame(final_table %>%
group_by(cat) %>%
summarize(
mean = mean(diff)
))

#add "total mean" to this table
final_table_2 = data.frame(final_table_2 %>% add_row(cat = "total", mean = mean(final_table\$diff)))

#format this table: add the random criteria to this table for reference
final_table_2\$random_1 = random_1

final_table_2\$random_2 = random_2

final_table_2\$random_3 = random_3

final_table_2\$random_4 = random_4

final_table_2\$split_1 = split_1

final_table_2\$split_2 = split_2

final_table_2\$split_3 = split_3

final_table_2\$iteration_number = i

results_table <- rbind(results_table, final_table_2)

final_results = dcast(setDT(results_table), iteration_number + random_1 + random_2 + random_3 + random_4 + split_1 + split_2 + split_3 ~ cat, value.var = 'mean')

#keep 5 largest resuts

}
``````

Now we can view the results:

`````` #view results

final_results

iteration_number  random_1 random_2  random_3  random_4    split_1   split_2   split_3         a         b         c total
1:                8 104.52182 104.8939  96.63609  99.14640 0.45389635 0.7970865 0.8264969 0.4560440 0.7954545 0.8265306 0.755
2:               10 119.04797 119.9907  93.13250  93.62925 0.27018809 0.5025505 0.6707737 0.2758621 0.5000000 0.6681465 0.632
3:                1 114.69535 117.7922 109.89274 116.39624 0.61857197 0.9609914 0.2661892 0.6180022 0.9615385 0.2702703 0.623
4:                6  85.64905 100.8127  94.02205 106.41212 0.00197946 0.7476889 0.1235777 0.2500000 0.7470588 0.1234568 0.442
5:                3 106.14908 119.7681  95.61753 100.73192 0.20678470 0.1787206 0.7166830 0.2111801 0.1802030 0.7146067 0.423
``````

According to the above table (for a very small random search of 10 iterations), the combination of "random_1, random_2, random_3, random_4, split_1, split_2, split_3" = ( 104.52182 104.8939 96.63609 99.14640 0.45389635 0.7970865 0.8264969) produces the highest "total" of 0.755 .

Question: Now, I am trying to solve this same problem using an optimization/search algorithm. In all the examples I have seen online, there is usually an "exact cost function" or an "exact fitness function" that can be defined. However, in my case, I don't think there is an "exact fitness" function.

When looking at the code needed to run the Genetic Algorithm for optimization:

``````GA <- ga(type = "real-valued",
fitness =  function(x) -Rastrigin(x, x),
lower = c(-5.12, -5.12), upper = c(5.12, 5.12),
popSize = 50, maxiter = 1000, run = 100)
``````

For my problem, I don't know how to define "fitness" .

I am assuming for "lower" and "upper" I can define them as :

``````lower = c(80, random_1, 85, random_3, 0, 0,0)
upper= c(120, 120, 120, 120, 1, 1, 1)
``````

But I am not sure if I am doing this right.

Can someone please show me if the Genetic Algorithm (or some other optimization algorithm) can be used for optimizing my problem (i.e. set of (random_1, random_2, random_3, random_4, split_1, split_2, split_3) that produces the biggest value of "total")?

Thanks Thanks

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.