Featured Blog | This community-written post highlights the best of what the game industry has to offer. Read more like it on the Game Developer Blogs or learn how to Submit Your Own Blog Post
Genetic Algorithms in Games (Part 1)
Part of the problem with procedural generation is ensuring the content is both interesting and challenging across multiple playthroughs. Genetic algorithms offer us a novel solution to this problem.
1 A basic explanation of genetic algorithms
A simple example of creatures generated with a genetic algorithm.
Genetic algorithms are a class of search algorithm that attempts to find the best solution in a number of tests less than the total number of possibilities within the search space. For example, you might have a genetic algorithm that is designed to find a specific string “#gamedev”. An example run of the genetic algorithm may run something like this:
Fig 1.1
# of generations | Resultant String |
1 | sgsarbt |
10 | mhga&ver |
20 | gebmeva% |
100 | agem*vea |
200 | ygvme*ed |
1000 | #gdmaeev |
20000 | #gamedev |
While this is just a rough example of the behavior there are a few interesting things to note, first off the total possible items within the search space is (26 letters+10 symbols+1 null character)7 or 94 931 877 133 possible permutations and it took substantially fewer iterations for us to reach our goal than if we had simply brute forced all possibilities. At each generation a fitness function is used to determine the quality of the string and it is then derived from it’s ancestors and mutated by a chosen factor.
2 Why care about this in games?
Let’s consider what genetic algorithms are good at doing. A genetic algorithm can quickly get a partially right answer from a massive – even at times impossibly large – search space. Each iteration produces variability which has the potential to be leveraged as content. As the system gets closer to it’s ultimate goal derived from the fitness function it gets closer to it’s perfect target within the search space. These characteristics make genetic algorithms an ideal approach to the massive search spaces found in procedural game content.
Part of the problem with procedural generation is ensuring the content is both interesting and challenging across multiple playthroughs. Genetic algorithms offer us a novel solution to this problem: through fine tuning of the fitness function and pretesting content prior to it being presented to the player, you can ensure certain standards to within a targeted margin of error. While this approach to procedural content can result in a situation where unexpected results occur, a carefully designed fitness function should be able to ensure that only viable content is presented.
3 How Do We Do This?
You could likely teach an entire graduate level course on this stuff so I’m not going to get into the proofs and other high level complexities but lets look at a simple implementation to get started. There are only a few main steps that need to occur for the algorithm to work correctly.
3.1 Generate an initial value or set of values
You will need a starting point, in our current project (to be announced) we are using some real world approximations to generate monsters so a random genetic string would look like this:
“TCCACCAAGTGCAACTATGCTCTGCGACGGTCGGTCGAAGCGCACCTTAACGTCTGGCCTCGCGAATATCCACACGGGACAGCATTACGATGTTAAGGTGCTGCTCCTGCCGTCGTCACGTCTGATCGTCTCCCTGTTTCGGTAATCAAGGACCTATTTTTTTGAATCTTGGATTCATGTAAGCGCCACCCGGGAAGTCTACCCGCAAAACCGTGTATAGCGCGGGTGTAAACCGAGACGGACAGTGAGTTTAAACACTCTGGCCGCGTTCTGTTACCAGGACTCCTCGGGCGGGAGCAGAGGCCTTGGACCGTAAATTAATAGGGCATATTTACGAACAGAATAGTACCTACTTCCATTACTACACGCTTGGAATGTGCTGGGTATTTCCGAGCTACGCTTTCGTCAAAGTTGCACATCGGCAGCTACCCATAGCGGTTACAGTCGCGAAGGAATGATTGAGAAAAAACCTACGGCAAACCGCTTATCTCTGGTTGAAACCAAGAAACAATATTTCTTTAAGAAGAATCTGACGTACGGCACCGGCAGCACGACGCCACGAAGCTGAGGGATACTAGAGCTGCTATCTAGTTGCTTTGTTGTTCCGACTGATACAGCTAGGTGCGGGCTGCGCACGCAGCTTCCCATAGGATGTAGCTGCCTTCGGATTATGGACCAAGAGAATAGGCCCAATCGTCCAGCGTCGTGTACGCCACTGGTGTTACTGCGAAAAGAGAGAAGCGCAGCGCTTTGCACCTACGAATGACCCTTGAACAAAGG,TTGGCCGGTGGCCTAGACAATCGCACATTGCTGGGACCGACGGCTGCGTCTTACGCATTGTGGTTGGGGACTCCGCTCGCTTTGCTTGCGTCTTATAAAATGACACAGAGCCCGATCTCACACTTTAATTACGCACTCGTTTAGAGCTCCGGATGTACTCGCGAAGTTTTTATTGCTCCTCCGCAAACCTCCGGCGGTACGGCCACATAACTCCCCTACGGGCCCATCGCATGTCCGTCACAGTGTCTGTCATCTTGAGTGATACATTCTTCCAGTATAGCGAGGAGGCGACTTCACGAAGTTAAGCTTTCAGTAGACTGGTACATCCTCGGCATATGTCAGCATTTAAATGAGTCCAGTCATTGTCGTGGACTGTCGCACGCGTGTTCGCACAATCAGGAAAATTAGACCTAGTAACAACCTTAGTGTGCACGCCTATCAAAATACCCGAAAAACACAAAAATCACTTCCCGCGTTGATCGCTGCTACCTACTCCTCCTGATTATGGATTGTCTGAATGAAGACTGGAAGGAGTGGCATCCATTCGGTCATGCGCCAGCATGGCATGCTCGTGTCAACACATAAATCCAGCGGTATACGTACGCCTCAACACACGGGGAAACAGTGCGCGTGTTTGTCCTGCTGAGCCTCACCACAATAATTAGTTAACAAACCTACGGACAGTTTGCGCATACATCTATCATCCTTACCGCAAGTAACCCAACTAGGTATTTTTTGGTAGTTATGGTCCTGAAAAGGGTTCGGTTGTGCTTCGAGTCA,0101010011111”
Well that probably looks pretty scary so let me explain what is going on here: our system uses two methods of search space manipulation to generate procedural creatures. First, each child has two parents and receives a genetic string from each supplying one half of their genetic data. These string are then mutated by a variable amount. What this means is that this massive text block actually contains two copies of each data point needed to generate a creature, each one inherited form a different parent. The final bit of the string denotes which side of the chromosomal pair is dominant for each of these data points.
3.2 Check the Fitness of the Initial Random Data
We will first of all need to check if the starting data is valid, if not we will need to regenerate it. This is to stop a broken creature from being presented to the player in the game. Were we only concerned with the search space this would not be necessary. Our base fitness function is pretty simple it just checks to make sure the creature has limbs and a body. In this data, a single data point is made of 4 characters and each chromosome or collection of data points defining a part is made up of 15 data points. The final section of the data that defines the dominant chromosome tells use which one of the two at that relative position to look at. For example, in the first chromosome position the dominance value is 0 so the first ancestors contribution will be used, while in the second position the dominance value is 1 so the contribution from the second ancestor will be used.
3.3 Birds, Bees, and Lovecraftian Monsters
We need at least two creatures under this model to get things started so we will need to repeat the first two steps to get a second one. Once we have our two parents we can breed some offspring. The breeding process takes two entities that have already passed the fitness function and samples one half of each genome. Since every creature contains the instructions of two full sets of chromosomes, only one of these chromosome sets will be passed from each parent to make a new entity that has two chromosome sets, one from each parent. Once we have derived the resulting offspring we can apply a mutation to it based on a series of rules to allow further genomic drift. Finally we check this against our fitness function to make sure our work has not resulted in a broken creature. We can repeat this breeding process using the new creature as one of our two parent entities if we wish.
Let’s look at an example of how this works. Figure 3.1 and 3.1 show our initial creatures that we will use for breeding. These creatures are comprised of a very simple structure where each chromosome only contains one data point comprised of 4 characters. We will start with these input creatures:
“TCCACCAAGTGCAACTATGCTCTGCGACGGTC,0100”
“CAAAGTTGCACATCGGCAGCTACCCATAGCGG,0011”
Figure 3.1
Chromosome # | Left Sequence | Right Sequence | Dominance Flag |
0 | TCCA | ATGC | 0 |
1 | CCAA | TCTG | 1 |
2 | GTGC | CGAC | 0 |
3 | AACT | GGTC | 0 |
Figure 3.2
Chromosome # | Left Sequence | Right Sequence | Dominance Flag |
0 | CAAA | CAGC | 0 |
1 | GTTG | TACC | 0 |
2 | CACA | CATA | 1 |
3 | TCGG | GCGG | 1 |
Now that we know what our parents look like we can create their input that will be used to create an offspring. Each parent needs to provide a complete set of chromosomes. Reproduction swaps data points between the left and right sequences of the parents genome. This can most simply be achieved by randomly taking inputs from their left and right sequences.
Figure 3.4
Chromosome # | Reproduction Sequence A | Reproduction Sequence B |
0 | TCCA | CAGC |
1 | TCTG | TACC |
2 | CGAC | CACA |
3 | AACT | TCGG |
For the next step, we will assign the input chromosome sequences to left and right value sets and then determine which side of the chromosomal pair is dominant. The simplest way to do this is to assign random true/false values as a flag for each chromosome. This gives us the final entity outlined in figure 3.5.
Figure 3.5
Chromosome # | Left Sequence | Right Sequence | Dominance Flag |
0 | TCCA | CAGC | 0 |
1 | TCTG | TACC | 1 |
2 | CGAC | CACA | 1 |
3 | AACT | TCGG | 0 |
So our final generated offspring looks like:
“TCCATCTGCGACAACTCAGCTACCCACATCGG,0110”
But this has not been mutated at all so we are not quite done yet.
3.5 Mutation
Without mutation, the algorithm can never get chromosomal values that do not already exist within our potential breeding pool. From the perspective of a search algorithm, not being able to find some integral part of the correct answer is a critical flaw. Generally one of the big efficiency tuning parts of a genetic algorithm is adjusting the mutation rate. For simplicity I’m just going to use a 10% mutation rate to provide an example. Since we have 24 pairs (or characters) in our sequence we will take the floor of the total number we should mutate Floor(24*0.1) number of points to make changes to these can be selected randomly as can their new values. These changes result in:
“TCTATCTGCGACAACTCAGCTACCCACATCCG,0110”
It is worth noting that due to only the dominant chromosome being the one that is present, only the first alteration will be visible in the potential answer since the second one is on a recessive chromosome. When expressed, this creature would appear as:
“TCTATACCCACAAACT”
4 Fitness Functions and How We Make It All Useful
Okay, so far we have a really complicated way to structurally generate semi random character strings. This alone would probably have far worse efficiency than simply brute forcing a solution to our search problem. How we actually make this useful is by using a success score or other weighting mechanism to control which ancestors breed. There are a huge number of ways to do this and how this evaluatory function is designed will be largely based on what you are trying to find within your search space. For example, if you wanted to find a specific string, you could make each letter be a chromosome with a single data point that has values that range from 0 to 36 inclusively ((26 letters+10 symbols+1 null character)-1). Each letter that is in the correct position is worth two points in our scoring function and each letter that occurs in our string but is not in the correct position worth one point. Extra letters do not score extra points. From our first hypothetical we get a scoring record that looks like this:
Fig 4.1
# of generations | Resultant String | Score |
1 | sgsarbt | 3 |
10 | mhga&ver | 6 |
20 | gebmeva% | 6 |
100 | agem*vea | 8 |
200 | ygvme*ed | 8 |
1000 | #gdmaeev | 12 |
20000 | #gamedev | 16 |
As you can see in figure 4.1, the solution tends to improve more slowly as it gets closer to it’s search goal. This means finding exact values is not the best application of this type of search function, especially when a search space is small. However, this can be very useful in games when it comes to procedural content because that is a case of massive search spaces that can acceptably produce variable results. In short genetic algorithms hold great potential to make better procedural content for games.
If you want to hear a longer winded version of my thoughts on how we can use this to improve procedural content stay tuned for Part 2 where we will look at making phenotype expression rules for procedural content.
You can follow @cubictimeline on twitter or follow us on Facebook to keep up to date with all our development blogging as we get ready to announce our exciting new upcoming project.
Patrick is CEO and one of the programmers for Cubic Timeline Productions Inc. Cubic Timeline released Vulture in 2017 on Steam and are currently preparing to announce their next title.
Read more about:
Featured BlogsAbout the Author
You May Also Like