Optimization Using Genetic Algorithms
Whether it’s minimizing manufacturing costs, finding the shortest path around a city, or finding an optimal airfoil design, optimization problems are everywhere. An optimization problem, simply put, is the problem of finding the best solution from all feasible solutions.
But how would you determine the best solution? That is the million dollar question. Suppose you were tasked with creating a box to hold liquid. The requirements state that the area must be at least 16m2 and every meter of material used in the perimeter costs $2.00. What’s the cheapest box you can build?
This problem is simple to solve mathematically. Working backwards from the required area of 16m2, we start by setting \(16=L \cdot W \) or \(L = \frac{16}{W} \) and substitute that into the equation for the perimeter: \[ \begin{align} P &= 2W - 2\cdot(16\div W) \\ &= W - 16\div W \\ &= W^{2} - 16 \\ W &= 4 \\ L &= W = 4 \end{align} \]
Solution: The cheapest box with an area of 16m2 is 4 x 4m and costs $32.00.
Gradeschool level math is sufficient in solving equations with simple relationships and few variables, but what about more complex problems?
Enter Genetics
It turns out that life on Earth is incredibly complex. Trillions of living cells interact every day to make up the body known as you. From a strictly biological point of view, life is a game where the objective is to stay on Earth as long as possible. The cells in your body can’t keep this up forever, so the only possible way to prevent your own extinction is by duplicating the important parts of your own existence and making … a new version of you. This, of course, is called reproduction.
Not every living creature makes the cut. Some are killed early – by hunting, disease, or starvation. Some just can’t find a suitable partner. If you’re an African impala and constantly being hunted by carnivourous predators, there’s no guarantee that you’ll live long enough to ensure your spot in the gene pool.
But if you did have to outrun a lion, might your odds be slightly more favourable if you could run just a little bit faster, jump a little bit higher, and see a little bit farther? Over time, the less fortunate impalas would be killed off, leaving the faster impalas to reproduce and pass on their “code”.
This “code” is found in every living organism in a special chemical called DNA, and among other things is responsible for tweaking the aforementioned variables.
Unlike computer code, a scientist looking at DNA would see a string of millions (or billions) of “bases” in the form of A, C, G, and T, representing the fundamental base chemicals of DNA. The code is divided into discrete sections called chromosomes, with two chromosomes forming a double-helix pair. Each chromosome affects the organism differently. For example, in humans (who have 23 pairs of chromosomes), the 23rd pair determines gender. Let’s look at what a chromosome may look like:
… ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAG …
Despite looking nonsensical, scientists have discovered that certain functional groups of code at certain locations are responsible for the biological traits found in the organism. These groups are called genes.
… CCATGCCTAAGCCCCATGACATGAAACG …
The highlighted sequences above represent the genes responsible for trait X and trait Y in the organism. Let explore how changing these (hypothetical) codes would have an effect on our impala:
Location | Gene | Effect |
---|---|---|
Red | ATGCCTAA | Runs slower |
Red | ATGCATAA | Runs faster |
Blue | ATGACATGA | Jumps higher |
Blue | ATGGGGTGA | Jumps lower |
The perfect impala would have both beneficial traits (running faster and jumping higher) and would be more likely to out-compete the impala who drew both short sticks, running slower and jumping lower.
I hope the concept of natural selection is becoming evident: despite the randomness of genetics, better traits are more likely to stay in the gene pool and worse traits are more likely to be killed off.
Here’s where it gets interesting: we can simulate the process of natural selection in a computer model to optimize any solution.
The Infinite Monkey Theorem
A monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will almost surely type the complete works of William Shakespeare.
How difficult would it be to type a given text completely randomly? Is that even possible?
Suppose there are 50 keys on a typewriter, and each of those keys have an equal 1/50 probability of being pressed. The probability of typing the word “Hello” is as follows:
Yikes! That’s a one in over 300 million chance to get a simple five letter word! And “Hello World”? Even typing eleven new letters every second since the beginning of the universe wouldn’t guarantee success.
Let’s try a different approach; one inspired by genetics.
Implementing a Genetic Algorithm
The brush must draw by itself. – Alan Watts
Let’s recreate this quotation. To simulate natural selection, we’re going to take the following steps:
- Create an initial population
- Calculate fitness
- Select best individuals
- Crossover or mutate the genes
- Repeat until successful
WARNING: Here be dragons!
Creating the initial population
We’ll start by creating a population of 500 random strings with a length of 29 characters each (the length of our desired quotation). These strings represent a single chromosome in each individual.
Generated Chromosomes |
---|
TRWLSSXFY KMUZLJMWGYVLDN UMCQ |
WVLMMZXZJVA LTDOHFZKVWTOTIZGR |
CUQTSYBCNUTBVAUGGDJGOXZSKAZTK |
NSZHRBYRXUJGTFQQQFYVBXPWXSOAZ |
QPRYUXLXRZSUIICWEBFK CGOR IBR |
ENLTWBNJXEBUOLMCGLVVUTIWGGEPM |
TNRIFHW KMTGZYTNFWPTWUKBARTGU |
NJYRKYMLHJOSMUMET IYXHM WCOWA |
KC IPQ TMEBQSMGUODVHJBPZWGLRK |
ISDVVEQYSVUBURPJAEEOCUXTPOXXA |
490 more… |
Calculating Fitness
Those words don’t look anything like the quotation we’re trying to recreate. But exactly how dissimilar are they? We need a quantifiable metric that can tell us exactly how close our chromosome (that is, the string) is to the the quotation. This is called a fitness function.
Fitness functions can come in many forms, but the general idea is that they’re given an input (that is, our string) and they output a number between 0 and 1 depending on how close our input is to the goal (with 1 being most optimal/perfect).
In this example, a fitness score can be determined by calculating how many characters are both the correct letter and in the correct location.
SPU FNXTFLIVSTEQIDJHBL KTFUVU THE BRUSH MUST DRAW BY ITSELF Fitness: 6/29 characters correct = 0.208
The purpose of our genetic algorithm is to get the fitness score of a chromosome to 1.00. That is, all letters will be correct and in correct position.
Selecting the best
Now that we have a list of each member of the population and their respective fitness, we want to decide which ones should be selected to create the next generation.
One method to do this is called fitness proportionate or Roulette Wheel Selection. Imagine all the chromosomes are placed on a roulette wheel, with the size of their pie being their fitness (that is, less fit chromosomes will have smaller slices).
The higher the fitness (yellow), the more likely the marble will select the chromosome. The lower the fitness (red), the less likely.
To do this mathematically, we must:
- Sort the chromosomes in descending order by fitness
- Determine the Sum of all fitnesses
- Generate a random number between 0 and the Sum
- Starting at the top of the list, cumulatively add the fitnesses until the sum is greater than our random number. The chromosome in this position is our selection.
- Repeat steps 3-5 for as many parents as we want.
Let’s go ahead and select two parents using our roulette wheel selection.
Parents: NJYRKYMLHJOSMUMET IYXHM WCOWA ENLTWBNJXEBUOLMCGLVVUTIWGGEPM
For convenience, I’ve highlighted the correct areas in the same fashion as the fitness step.
Crossover and Mutation
Modifying the chromosomes to create a fitter individual is the quintessential element of genetic algorithms. This is done by one of two methods:
Crossover: Two parents chromosomes are split and recombined (either down the middle, at a random point, or at multiple points) to create a new chromosome.
Crossover: NJYRKYMLHJOSMUMET IYXHM WCOWA + ENLTWBNJXEBUOLMCGLVVUTIWGGEPM -------------------------------- NJYRKYMLHJOSMUMCGLVVUTIWGGEPM
Mutation: A (typically random) change is made to the chromosome. This can be randomly adjusting (adding or subtracting) a value, swapping or changing the order of values, or randomly changing values.
Mutation: AABBCDEF --> BBCCCDEF (adjustment) AABBCDEF --> CDBBAAEF (swapping) AABBCDEF --> AABBMPRZ (random change)
Typically a combination of crossover and mutation is used with a ratio dependent on the problem being solved (for example, crossover with 10% chance of mutation).
Putting it all together
We can create populations and evaluate their members on fitness. We can select the best ones and use them to create new chromosomes from crossover and/or mutation. We have all the pieces we need to implement this algorithm. The last step is to repeat these continuously until we achieve our goal.
- Create an initial random population pool
- Repeat the following steps until we’ve reached our max fitness goal:
- Sort the current population pool by fitness
- Create a new empty population pool and fill it by:
- Selecting two parents using roulette wheel selection on the old pool
- Crossover and/or mutate the parents to create a daughter
- Add the daughter to the new pool
- Repeat until the new pool is of desired size
- Delete the old pool and set the current pool to the new population
Results
Below are the results of running the genetic algorithm using our implementation. It took less than a second to successfully converge on our goal, and did so after only 88 generations! Remember the brute-force method we discussed earlier? Even using every computer in the world to generate random strings for the remainder of the universe would likely not yield the results that we achieved.
Results of the Genetic Algorithm
Generation | Best Chromosome | Fitness |
---|---|---|
0 | TGP YMYWOKHKZJZRCKPHTZ AAEDLG | 0.137931 |
1 | LDX GHKGHOLPLTHRCKPHTZ AAEDLG | 0.172414 |
2 | KSEKOTHBCRXDSBGWRI XIYPISVESQ | 0.206897 |
3 | OWEEARCXHYOCRUQDGLFABO MGSALZ | 0.275862 |
4 | TWEEARCXHYOCRUQDGLFAOPOIXSQOY | 0.241379 |
5 | TSEKOTHBCRXDSBGWRI XIYPISSLFE | 0.241379 |
6 | T MZJRMMUCMDSDRJREWRNPNOBSEQG | 0.275862 |
7 | OWE ABUECHAMQUKGRAA MZK VSMLF | 0.310345 |
8 | T EEAPASI XDSBGNRI XIYXIJSFAF | 0.344828 |
9 | TXEEARCXHCXFHKRDRDN BYXIJSFAR | 0.379310 |
10 | TWEEPRVBHYOCRAUDRI OZYPISSALF | 0.379310 |
11 | I MZURUSHOMJD CYBAU JWAIPSELF | 0.413793 |
12 | TXEEARCSI LRIA DREWOZYPISSALF | 0.482759 |
13 | TWEHBBUSHCMJDPNFRQXGUYPISSALF | 0.448276 |
14 | TWEHBBUSHOODSBUDRDNQNQOIPSELF | 0.482759 |
15 | TWEHBBUSHOODSBUDRDNQNY AGSEXS | 0.448276 |
16 | T EEAPOSHJUFSTUDRI RYAIPSELF | 0.517241 |
17 | OWECBJUSQ BKSTPDWPE RYAIPSELF | 0.517241 |
18 | TWEEPRVBS XDSP DREWQPYJIPSELF | 0.517241 |
19 | TXEEARCSI MBSTUDRI RYAIPSELF | 0.586207 |
20 | TWEHBBUSHOMCRUQDRDN BYXIJSLLF | 0.551724 |
21 | TXEMBBUSHOMCRUQDRDN BYXIJSLLF | 0.551724 |
22 | TXEMBBUSHOMCRUQDRDN BYXIJSLLF | 0.551724 |
23 | T LHBBUSHJEUS FDREW BC IPSELF | 0.620690 |
24 | TWEZBBUSHOMCRUQDRDN BYXIJSLLF | 0.551724 |
25 | TWEHBBUSHCPCSBQDRDN BYXIJSLLF | 0.551724 |
26 | LOE ARUSHOOKSTPDJMU JYPIPSELF | 0.551724 |
27 | TEGHBBUSHXDUSAUDRDN BYVIPSELF | 0.586207 |
28 | THMJBBUSHCMUSSPDRIFFJG IYSELF | 0.586207 |
29 | TXEMBBUSHOMLNLEDREN BY IPSFLF | 0.586207 |
30 | THEEBRUSHPADNBODRAEQBYDIDLELF | 0.586207 |
31 | TSEGBBUSHCMUSMODRJUOBYDISSELF | 0.620690 |
32 | THEEBRUSHPADNBODRAEQBY IZCELF | 0.620690 |
33 | TWEHBVUSHCMJZTUDRIU JY IZSELF | 0.620690 |
34 | T EHBBUSH XCSTPDREE BYXIESELF | 0.655172 |
35 | THEEBRUSHOKUSTFDRIEOBY ISSXLF | 0.689655 |
36 | THEEBRUSHOKUS WDREEQBY ISSXLF | 0.655172 |
37 | TWEHKRUSH MFST DREWBTYWIXZELF | 0.655172 |
38 | TXECBNUSCCMUSTQDRBN BY ISSELF | 0.689655 |
39 | TSEGBRCSHCMDST DREWBSY IPSELF | 0.689655 |
40 | TSEGBRUSCCMUSTQDRBN BY IVSELF | 0.724138 |
41 | THEGBRUSCCMUSTQDRBN BY IVSELF | 0.758621 |
42 | TWEEBRUSH MFST DREWBZY IZSESF | 0.724138 |
43 | TWE GRUSHPMUSM DREW JYEIESELF | 0.724138 |
44 | TWEEBRUSH MFST DTDEQBY IESELF | 0.724138 |
45 | OHE KRUSH MFST DREWBZO IPSELF | 0.724138 |
46 | TWEEBRUSH MFST DREWBZ IFSELF | 0.724138 |
47 | THZ BRUSH MUBTIDREW QY IPSELF | 0.793103 |
48 | THZ KRUSHCMFST DRBN BY IVSELF | 0.758621 |
49 | THLHBBUSH MFST DREN BY IVSELF | 0.758621 |
50 | NHE QRUSHPMUST DREE QY IZSELF | 0.758621 |
51 | THEEBRUSHCMFST DRAWRVY ISSGLF | 0.758621 |
52 | THEEBBUSHCMFST DRAWRVY IPSELF | 0.758621 |
53 | NHE ARUSHPMUST DRAN NO IPSELF | 0.758621 |
54 | NHE ARUSHPMUST DRAWRVY IPSELF | 0.793103 |
55 | NHE ARUSHPMUST DRAWRVY ISSELF | 0.793103 |
56 | THE ARUSHPMUST DRAWRVY ISSELF | 0.827586 |
57 | THEEBBUSHCMUST DRAWRVY ISSELF | 0.793103 |
58 | OHE KRUSH MFST DREWQBY IOSELF | 0.793103 |
59 | THEEURUSHCMUST DRAWRVY ISSELF | 0.793103 |
60 | T EHBBUSH MPST DRAWEBY ISSELF | 0.793103 |
61 | THEBBBUSH MPST DRAWEBY ISSELF | 0.827586 |
62 | OHZ BRUSH MNST DRAWEBY ISSELF | 0.827586 |
63 | THEBBRUSH MNST DRAWEBY ISSELF | 0.862069 |
64 | THEEBBUSH MUSM DRAW BY IRSHLF | 0.827586 |
65 | THE DRUSHPMUST DRAW BY IYSELF | 0.896552 |
66 | THE BRUSHSMUST DRDW BW ISSELF | 0.862069 |
67 | THE BRUSHXMUST DRAJ RY IOLELF | 0.827586 |
68 | OHE BRUSH MUSM DRAC DY ISSELF | 0.827586 |
69 | THZTBRUSH MUST DRAW BY IBSBLF | 0.862069 |
70 | THEEBBUSH MUST DRAJ BYDIXSELF | 0.827586 |
71 | THE BRUSH MUST DRAU BZNIJSELF | 0.862069 |
72 | THE BRGDH MUST DRAJ BY IPSELZ | 0.827586 |
73 | THE BRUSHMQUST DRAW BY ISSELF | 0.896552 |
74 | THE BRUSH MUST DRAJ BW ISSELF | 0.896552 |
75 | THE BRBSH MHST DRAW BY IBSELF | 0.896552 |
76 | THE BRBSH MHST DRAW BY ISSELF | 0.896552 |
77 | THE KRUSH MUST DRAJ BY ISSELF | 0.896552 |
78 | THEEBRUSH MUST DRAW BY IBSELF | 0.931034 |
79 | THE BRUSH MUST DRAJ DY ISSELF | 0.896552 |
80 | THEEBRUSH MUST DRAW BY IXSELF | 0.931034 |
81 | THE DRUSH MUST DRDW BY ISSELF | 0.896552 |
82 | THEEBRUSH MUST DRAW BY ILSETF | 0.896552 |
83 | THEBBRUSH MUST DRAW BY IUSERF | 0.896552 |
84 | THEEMRUSH MUST DRAW BY IUSELF | 0.896552 |
85 | THE BRUSH MUSTPDRAW BY IBSELF | 0.931034 |
86 | THE BRUSH MUSTPDRAW BY IBSELF | 0.931034 |
87 | THE BRUSH MUST DRAW BY IUSELF | 0.965517 |
88 | THE BRUSH MUST DRAW BY ITSELF | 1.000000 |
One last time, let’s go over the benefits of genetic algorithms:
- They’re inspired by the ingenuity of nature: the same process that created all of life.
- They’re (relatively) easy to implement in little code.
- They can easily be applied to a wide variety of problems.
- They’re significantly more efficient than a brute-force approach.
There are many more aspects to genetic algorithms that haven’t been covered. Future posts will introduce things such as elitism, encoding, and different selection methods.
Lastly, I leave you with a tool to play with the implementation we devised. Play around with the parameters: see how long a lengthly goal takes to succeed, or how a small population or low mutation rate will affect the evolution process. All the code (both the Python and JavaScript versions) are available through the links at the bottom of the page.
Play with it
Source code
Available in the near future. Check back soon!