I am working with R. I am following this tutorial (A quick tour of GA) and am learning how to optimize functions using the "genetic algorithm".
The entire process is illustrated in the code below:
Part 1: Generate some sample data ("train_data")
Part 2: Define the "fitness function" : the objective of my problem is to generate 7 random numbers ( "random_1"
(between 80 and 120), "random_2"
(between "random_1" and 120) , "random_3"
(between 85 and 120), "random_4"
(between random_2 and 120), "split_1"
(between 0 and 1), "split_2"
(between 0 and 1), "split_3"
(between 0 and 1 )), and use these numbers to perform a series of data manipulation procedures on the train data. At the end of these data manipulation procedures, a "total" mean variable is calculated.
Part 3: The purpose of the "genetic algorithm" is to find the set of these 7 numbers that produce the largest value of the "total".
Below, I illustrate this entire process :
Part 1
#load libraries
library(dplyr)
library(GA)
# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,100,5)
c1 = sample.int(1000, 1000, replace = TRUE)
train_data = data.frame(a1,b1,c1)
Part 2
#define fitness function
fitness <- function(random_1, random_2, random_3, random_4, split_1, split_2, split_3) {
#bin data according to random criteria
train_data <- train_data %>% mutate(cat = ifelse(a1 <= random_1 & b1 <= random_3, "a", ifelse(a1 <= random_2 & b1 <= random_4, "b", "c")))
train_data$cat = as.factor(train_data$cat)
#new splits
a_table = train_data %>%
filter(cat == "a") %>%
select(a1, b1, c1, cat)
b_table = train_data %>%
filter(cat == "b") %>%
select(a1, b1, c1, cat)
c_table = train_data %>%
filter(cat == "c") %>%
select(a1, b1, c1, cat)
#calculate quantile ("quant") for each bin
table_a = data.frame(a_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_1)))
table_b = data.frame(b_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_2)))
table_c = data.frame(c_table%>% group_by(cat) %>%
mutate(quant = quantile(c1, prob = split_3)))
#create a new variable ("diff") that measures if the quantile is bigger tha the value of "c1"
table_a$diff = ifelse(table_a$quant > table_a$c1,1,0)
table_b$diff = ifelse(table_b$quant > table_b$c1,1,0)
table_c$diff = ifelse(table_c$quant > table_c$c1,1,0)
#group all tables
final_table = rbind(table_a, table_b, table_c)
# calculate the total mean : this is what needs to be optimized
mean = mean(final_table$diff)
}
Part 3
#run the genetic algorithm (20 times to keep it short):
GA <- ga(type = "real-valued",
fitness = function(x) fitness(x[1], x[2], x[3], x[4], x[5], x[6], x[7]),
lower = c(80, 80, 80, 80, 0,0,0), upper = c(120, 120, 120, 120, 1,1,1),
popSize = 50, maxiter = 20, run = 20)
The above code (Part 1, Part 2, Part 3) all work fine.
Problem: Now, I am trying to produce some the of the visual plots from the tutorial:
First Plot - This Works:
plot(GA)
But I can't seem to produce the other plots from the tutorial:
Second Plot: Does Not Work
lbound <- 80
ubound <- 120
curve(fitness, from = lbound, to = ubound, n = 1000)
points(GA@solution, GA@fitnessValue, col = 2, pch = 19)
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
Error in xy.coords(x, y) : 'x' and 'y' lengths differ
Third Plot : Does Not Work
random_1 <- random_2 <- seq(80, 120, by = 0.1)
f <- outer(x1, x2, fitness)
persp3D(x1, x2, fitness, theta = 50, phi = 20, col.palette = bl2gr.colors)
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Error: Problem with `mutate()` column `cat`.
i `cat = ifelse(...)`.
x argument "random_3" is missing, with no default
Run `rlang::last_error()` to see where the error occurred.
Error in z[-1, -1] : object of type 'closure' is not subsettable
Fourth Plot: Does Not Work
filled.contour(random_1, random_2, fitness, color.palette = bl2gr.colors)
Error in min(x, na.rm = na.rm) : invalid 'type' (list) of argument
Can someone please show me how to fix these errors?
Note: Does anyone know if this optimization function is trying to find a "maximum" or a "minimum"?
Thanks