Evolving Pathfinding Algorithms Using Genetic Programming
If the human brain is a terribly complex computer that we may never understand, how can our in-game AI ever hope to be believable? Cubism author Rick Strom explores the possibility of Genetic Programming, using the same evolutionary mechanism that nature has relied on all along.
Introduction
Pathfinding AIs tend to fall short in one of two ways: either they are so incredibly effective and efficient that they don't convey intelligence as much as omniscience, or they are so incredibly inept that human observers can't help but feel frustration watching them. The first case is exemplified by an agent which ignores locally acceptable paths in favor of the ideal global path, thereby giving away the secret that it has knowledge of the entire map. The second case can be seen in a wide variety of stupid behavior, but my favorite example is the agent which travels along a river to avoid the relatively small penalty of crossing water, only to plow through an enemy camp and ultimately meet its doom. Anyone familiar with real-time-strategy gaming has no doubt met this breed of pathfinder before.
The problem, assuming we choose not to endow our pathfinders with God-like knowledge of the Universe, is that we are trying to simulate the sort of decision making employed by a computer that we still don't understand - the human brain. Rather than trying to hard code something this complex, perhaps we should be allowing our algorithms to develop over thousands and thousands (or millions and millions) of generations, using the same evolutionary mechanism that nature has relied on all along.
Genetic Programming
Genetic programming is a technique which uses the genetic operators of reproduction, mutation and cross-breeding to find programs which solve a given problem. The goal is to develop abilities in a population of programs in the same way living organisms have developed their abilities - namely, through evolution. Although at first the idea of computer programs reproducing sounds bizarre, it is less so when you realize that we are simply abstracting biological concepts to the point where we can sensibly apply them to programming.
Genetic programming, and genetic algorithms in general, are approximate methods for finding a solution to a given problem, meaning they aren't guaranteed to produce the correct solution, but rather a solution which has a high probability of being correct. If our problem is to find a reasonably intelligent and realistic pathfinding algorithm, this isn't a deal breaker, since we make no assumption about there being only one correct solution. In fact, anything that looks good to the human eye will do. Ultimately, then, we can set the rules for a population and its method of natural selection, and let it go. We stop the process when the most-fit algorithm looks as if it is behaving both wisely and realistically.
Rules and Requirements
Although applying genetic programming techniques to the pathfinding problem is fairly easy, there are several rules and requirements that must be considered before beginning. This article is based on a project called Hampton, named after the Hampton Court hedge maze. The project is available at http://www.stromcode.com/Hampton along with a technical write-up and source code. Although requirements will vary depending on the project, the Hampton project was developed with an RTS in mind; specifically it evolves pathfinders good at maneuvering through square grids on which individual squares are either unpassable, or passable with varying penalties.
The Map
For our purposes, maps are square grids 40x40 in size. Moving from one square in the grid to another constitutes a single move, and upon moving the agent acquires a penalty in the range 0-9 for that square. Additionally, squares can be impassable, and moves outside the map are impossible. If an agent attempts an illegal move, it immediately "dies" and effectively fails.
Hampton allows you to create custom maps and can generate random maps (which we'll see when talking about adaptation).
Fitness
In order to evaluate the programs in a given generation, we need a way to score a program's performance. While the specific fitness function used will vary depending on the specific application, a simple method, and the one used by Hampton, is to simply run the program on the map and sum the number of moves it makes with the penalties it has accumulated along the way. Because we are looking for programs that ultimately reach their goal, we penalize programs that fail to do so by adding a substantial value - a "death penalty" if you will - to their final score (a value which will assuredly put the failing program's score outside of the range of scores a successful program might achieve).
Once this process is complete for all programs in the population, we can easily find the most fit program by simply sorting the programs according to fitness, and selecting that program which has achieved the lowest score.
The Language
Of primary importance is the syntax of the language which GP uses to build candidate programs. LISP is common for genetic programming, although ultimately it will depend on the application. Most importantly, it should have a structure that lends itself to reproduction and mutation. Programs that can be represented as trees work well for this, as we will soon see.
For the Hampton project, I developed a specific language consisting of a set of terminals and functions. The functions are questions which the program can ask about its current situation, while the terminals are the moves it can make once it has asked the questions it deems necessary. Since we want our agent to behave realistically, we strive to give it access only to the information and moves a human player would have access to. For example, we limit its environmental knowledge only to a few squares immediately ahead of it - to see more, it will have to physically move to another location and begin asking questions anew.
The set of functions and terminals is shown below. The function set developed gradually, as earlier function sets were determined to be too limiting. For example, the isGoal* functions were added when it was recognized that programs performed better with a sense of direction. Without these functions, agents would wander around in circles until they blindly hit their goal. The doBoth function was added to give programs memory, and will be discussed in detail when the topic of memory is addressed.
Function | Description |
isWaterAheadX | Detect water within X squares |
isEnemyAhead | Detect enemy units one square ahead |
isBlockedAheadX | Detect any roadblock within X squares |
isSteeperAhead | TRUE if penalty ahead greater than current penalty |
isLessSteepAhead | TRUE if penalty ahead less than current penalty |
isGoalLeft | test relative direction of goal |
isGoalRight | test relative direction of goal |
isGoalAhead | test relative direction of goal |
isGoalBehind | test relative direction of goal |
doBoth | Execute both children |
Our set of terminals (moves) is much simpler. An agent can either move forward one square, or rotate itself left or right by one quarter turn. These nodes are represented as moveForward, turnLeft and turnRight.
The Interpreter
Once our language is defined and the fitness function determined, we need to be able to quickly and easily run the program against the map and determine its score. This is where we see the benefit of requiring programs that can be represented as trees. Programs are run by traversing the tree until a terminal is reached, and then returning to the root node and repeating, until either the goal is reached or the program makes an illegal move and dies.
Because the function set is composed of only boolean functions, our trees are binary. When a function node is encountered, the system determines the answer to the question, and follows the right branch if the answer is yes, or the left branch if the answer is no. Once a terminal is read, the move is made and program execution continues again from the root.
Memory
The exception to the previous discussion on processing function nodes is the doBoth function, which was added to the function set in order to give programs memory. When the interpreter meets a doBoth terminal, both of its child nodes are executed. To prevent programs from executing in multiple threads, at least one of the children of a doBoth function node is required to be a terminal.
Effectively, this allows a program to inspect the landscape ahead of it, make a turn (or a move, if that seems reasonable) and ask questions about its new situation, without forgetting what was asked about its previous situation. If the justification for giving programs memory is not immediately clear, consider the situation where an agent analyzes a position and opts to turn left. If each of the remaining three directional orientations is worse than the first, the agent must either choose a worse direction to move forward, or enter a loop state. After all, without memory, once the agent returns to its original orientation, it has forgotten it has been there before and will again turn to a new orientation.
The effect of adding the doBoth function to the function set was impressive - programs began to actually demonstrate recognizable strategies. For example, programs without memory were prone to falling into traps such as long hallways, but with memory they quickly adopted wall-hugging strategies which allowed them to enter a hallway and find their way out of it. Memory also allows programs to return to previous locations and take a path which it previously passed up. Without memory, returning to a previous location could only result in another loop condition.
There are other ways to give your programs memory. You could set up a small memory register, for example, and give the program some means of writing and reading from it. Or you could represent your programs as finite state machines, and give them a stack to use to store information about their environment (essentially creating a pushdown automaton).
I can't imagine what advantage one approach would have over another, but the idea of embedding memory into the program itself (via a function like doBoth) seems to be the simplest and most elegant solution.
Evolution
With our requirements stated, our language defined and our interpreter built, we are ready to start the evolutionary process.
In the beginning, we create a population of random programs, and by random I mean truly random. Trees are initially built by creating a root node and then randomly attaching child nodes to it, until - randomly - a terminal node is selected, at which point the branch terminates. As you would expect, these random programs aren't particularly good at anything, and early generations fail miserably at finding their goal.
Programs that actually are good at finding the goal are found by combining the best programs in the population to create child programs. The idea is that children may inherit the best parts of their parents, and thus later generations become more fit than previous generations. To do this, new generations are constructed using reproduction, crossover and mutation.
Reproduction is simply a means of ensuring that the most fit member(s) of the previous generation survive into the next. When constructing the new generation, we simply copy over one or two of the lowest scoring programs and allow them to continue to compete.
Crossover is the operation that actually takes its inspiration from biology. In crossover, we create new programs by combining parts of two programs to create two new programs. Hampton does this by swapping random subtrees of the two lowest scoring (i.e. most-fit) programs. By doing this, we are hoping that two programs which each perform well in unique circumstances can combine to form a child program that performs well in both.
Mutation, the final operator, helps to prevent our population from falling into a rut. Mutation is done by selecting a random node in the program tree, deleting it (and its subtree) and replacing it with the root node of a new random tree. This effectively introduces new genes into the pool. This is most important if we are running steady-state, which means that in each generation programs not affected by the genetic operators remain in the population. If we are not running steady-state, then the population is repopulated with random programs on each generation, flushing the population with new genes and making mutation less important.
All of these operations occur with a certain probability, which should be adjusted until good results are found. High probability of crossover, low probability of mutation and near-certain reproduction seem to be the most effective in a non-steady-state system.
Adaptation
A final consideration is adaptation. As the project has been described, we have set up a system which will evolve programs that are well suited to a particular map, when in fact we probably want programs that will perform well no matter what map we throw at them. One way to do this is to cycle the map at regular intervals, causing programs that are too specific to fail on the map change, and be dropped out of the population. Over time, we should end up with programs that perform well from one map to the next - programs that survive drastic changes in the environment, just as successful species do in the animal kingdom.
One thing to keep in mind is that the process of adaptation is probably the one that will be the most time consuming. In a system where natural selection is determining which genes in a population survive into the next generation, evolving a population that performs well against a static challenge can occur fairly rapidly. Evolving anything that can not only survive the current environment, but is ready for possibly cataclysmic changes, will demand much more time in the process.
To allow for highly adaptable programs, Hampton can generate random maps and cycle them every few generations.
Results
We've been told that evolution is an extremely slow process, with small changes occurring over hundreds of thousands of generations, but new science seems to indicate that these changes can occur much faster than we previously thought. So it might not come as much of a surprise that Hampton finds successful solution programs fairly quickly, even on very complex maps. Most-fit programs can reach their goal in as few as a hundred generations. This is stunning when you consider the minimum of information and ability we are giving our programs access to, and that our initial programs (as well as programs generated on repopulation) are completely random, nonsense jumbles of function and terminal nodes.
The real measure of success, however, is aesthetic. Our evolved programs are successful if you yourself see intelligence in their strategies. If so, then we have taken a step forward towards creating more realistic AI, and a less frustrating experience for the gamer.
Final Thoughts
If you plan to extend Hampton or create your own genetic programming experiments, there are a few other subjects that might be interesting to consider.
First, you might want to consider program size, and the impact large programs will have on the success of your project. By some miracle, Hampton tends not to create huge, deep trees, although there is no mechanism in place to prevent it from doing so. On the positive side, a larger program tree will have the potential to be much more complex than a shorter tree - a tree which is 100 nodes deep will naturally be capable of much more advanced decision making than a tree which is 5 nodes deep. On the other hand, however, large trees will take longer to parse, will consume more memory, and may use up all their complexity making horrible decisions. Whether or not some sort of pruning process would be beneficial or not would make an interesting study.
Another question we could ask is whether or not we should "correct" a program that is inefficient or redundant. Although a lot of genetic programming projects tend to stick rigidly to the biological model, I see no reason why a little creative hacking couldn't be applied to help the process along. It is possible that reducing redundancy and general stupidity in programs in the population before performing cross-over could result in much more rapid evolution and much more useful programs in the end. If further justification for messing with the basic idea of GP is required, we could consider whether the genotype (i.e. the makeup of the program) or the phenotype (what the program ultimately does) is more important to us.
I tend to think of ideas like genetic programming as jumping off points rather than sets of rigid rules which must be followed closely. Perhaps a combination of genetic programming, other techniques, and pure creativity can lead to a pathfinding algorithm that truly evokes awe in anyone who witnesses it. At the very least, we can hope for an algorithm which won't cause gamers to bang their heads into their desks.
Read more about:
FeaturesAbout the Author
You May Also Like