Sponsored By

Training Virtual Creatures with Reinforcement Learning and Genetic Algorithms

I have always been interested in virtual creatures, and I finally got a chance to make some of my own! In this video I explain the ideas behind my project, including artificial life, reinforcement learning, and genetic algorithms!

Caleb Compton, Blogger

January 6, 2021

13 Min Read
Game Developer logo in a gray background | Game Developer

The following article is a reproduction. The original article, and nearly 150 more, can be found at RemptonGames.com

Transcript:

What’s up designers, and welcome back to Rempton Games. For those of you who have been clamoring for more AI content on this channel strap in, because this one’s a doozy. Today I will be talking about a project that combines reinforcement learning, genetic algorithms, and artificial life, and actually was the basis for the report that got me a masters degree in computer science. I will try to cover all of the important information, but there is no way I can cover everything in this video, so if you have any questions at the end please leave them in the comments and I’ll try to answer them in a follow-up video. Without further ado, let’s get started.

First, lets go over what this project is. At its core this project is an experiment with artificial life, also known as Alife. Alife is basically a field of science that deals with simulated life-forms, and has research applications ranging from evolutionary biology to robotics. Artificial life is also a pretty common genre of video games, including the Sims, Spore, and even Tamagatchii, and is a topic I would like to talk a lot more about in future videos on this channel.

My personal interest in artificial life sprang out of my interest in monsters generally, and monster-based video games like Pokemon in particular. I liked these games, but always wished they could be more personal. Sure, these games may have hundreds of different creatures to choose from, but what if the player was able to get completely unique, customized creatures that evolved just for them? There were games like Spore where you could manually design your creatures, but what if the creature evolved naturally based on how you played the game? That concept was the inspiration behind this project.

Spoiler alert: it turns out I was biting off a lot more than I could chew with those aspirations, but it at least gave me a direction to work towards. After months of research, and some advice from my advisory committee, I decided on a project where instead of evolving a creature based on interactions with a user, it would instead adapt to achieve a specific goal in a virtual environment. Specifically, it would learn to walk towards a specific goal location in the Unity 3D game engine.

There are many different ways to achieve this goal, but the solution I chose had 2 main parts. The first part was the “brain” – my virtual character had to learn how to get from point A to point B. The second part was the “body” – as it walks, the creature’s body changes and evolves to get better at its task. These two systems are running at the same time, and work together to get better and better at their task. Let’s take a closer look at each of these systems individually, and then we can see what happens when they work together.

Let’s start with the brain. To teach my creature how to walk I used a technique known as Reinforcement Learning, which I talked a little bit about in my Pokemon AI video that you should definitely check out if you haven’t already. However, in that project I sort of “fudged” the RL – I wasn’t using it how it was supposed to be used. This project was different – Reinforcement Learning is a really interesting tool, and I made full use of it.

The idea behind reinforcement learning is actually very similar to the way that animals and even people learn. Suppose for example that you had a puppy. Really young puppies don’t know much – they run around, pee and poop wherever they want, and chew on everything in sight. They don’t know what is right and wrong, so it is up to the owner to teach them. If they chew something they aren’t supposed to they may be scolded, but if they obey a command they might be rewarded with pets and treats. This system of positive and negative reinforcement teaches the puppy how to behave over time.

Reinforcement learning teaches machines how to behave in basically the same way. You give the machine a certain task – in this case, moving to a target location. At first the program has no idea how to move towards the goal, so it basically just behaves randomly – like the puppy not knowing where to go to the bathroom. Every time the character does an action it is either rewarded for a positive action, or punished for a negative action. Over time the virtual creature learns to take actions that will lead it to its goal, and avoid actions that result in punishment.

Of course, a program is not the same as a puppy, so it needs a little bit more than just pets and treats. True, you need to tell the program whether it has achieved it’s goal, but you have to specify other things as well. If I told you to close your eyes and put one finger on the tip of your nose you would be able to do it because you naturally know where all of your body parts are located, even when you can’t see them. A virtual creature, on the other hand, doesn’t even really know that it has limbs, much less how to move them in a coordinated manner. Therefore it is up to the programmer to provide information about what limbs the creature has, where they are located, how they are connected, how they move, etc. For example, you have to specify how the lower legs are connected to the upper legs, and how much they are allowed to move each of their joints.

Similarly, animals have senses such as sight, hearing, touch, and smell that they can use to observe their environment. A virtual creature has no such senses, unless the programmer provides them. Basically, it is up to the programmer to specify what the character can learn about its environment. For this project the creature knew stuff like where the goal was, which direction it was facing, and how fast it was moving. All of this information is necessary for the creature to learn and get better over time – after all, it wouldn’t be very good at reaching a goal if it had no idea where the goal was!

Another thing I specified for this project was a curriculum. A curriculum is a tool that helps with reinforcement learning training by changing the difficulty of the task over time, and is not strictly necessary but can help with training. In this case, the main thing that I changed with my curriculum was how far away the goal was. Because the creatures basically learn by trial and error, if you start off with the goal really far away it is unlikely for the creature to ever reach it. Because of this, I start training with the goal pretty close, and then move it further away as the creatures learn over time.

While specifying all of this information can be a lot of work, the cool thing about reinforcement learning is that once you start running the program you are completely hands off. It’s actually pretty magical – you just let the program do its thing, and watch as it begins to learn before your very eyes, often in ways that you never could have predicted.

Reinforcement learning provides the brain, but for this project I was also interested in evolving the creature’s body. In the real world animals evolve different shapes for different tasks, and I wanted my virtual creature to do the same thing. To do so, I once again took inspiration from the real world and used what is called a genetic algorithm.

A genetic algorithm is a type of computer program that mimics the evolution of actual living creatures, such as animals and plants. For a genetic algorithm to work, we cannot have just one virtual creature – we actually need a larger population of several creatures. For this project we use a population size of 10 creatures. These creatures then train for a certain amount of time using reinforcement learning, which we discussed previously. Then, after training for a while each creature will have a score. Remember when I said that different actions will result in either punishments or rewards for the creature? If you combine all of those punishments and rewards you will get a total score for the creature, which tells you how well the creature did at training.

Creatures who got better scores will then go on to become the “parents” of the next group of creatures, while those who didn’t do very well will be removed from the population. In this way, the entire population of creatures will gradually get better over time.

Copying useful traits from your parents is an important part of evolution, both in the real and virtual worlds, but it is only part of the puzzle. Another very important ingredient is mutation. If you just keep copying traits from previous generations, then you will never get anything truly new, and you might get stuck in a situation where all of your creatures are the same, and are not necessarily very well suited for their task. This is called a local optimum. To understand a local optimum, imagine you are outside in an area with lots of hills. You want to climb to the top of the tallest hill, so what do you do? The answer seems pretty obvious – just find the tallest hill you can see and start climbing. But now imagine that it is a very foggy day, and you can only see a few feet around you. Now you can’t even see the entire hill, so how are you supposed to get to the top?

The best you can do is look around you to try to see what is “uphill” and what is “downhill”, and start going “uphill”. If you do this, you will at least reach the top of a hill, even if it isn’t the tallest hill. However, once you get to the top you are basically stuck – everything around you is downhill. There may be a much taller hill somewhere else, but you have no way of knowing about it. This is basically what happens when a program gets stuck in a local optimum – there might be a better solution somewhere, but it has no way of knowing about it.

This is why you need mutation – it adds randomness which makes it less likely for you to get stuck on a lower hill. In this project, mutation occurs when moving from one generation of creatures to the next. Usually the creatures will just copy traits from its parents, but every now and then it will produce a completely new, random trait. If you put these two systems together you end up with a genetic algorithm that copies successful traits from previous generations, but has enough variety to hopefully avoid getting stuck in a local optimum trap.

Now that we’ve established our two systems – reinforcement learning for teaching and genetic algorithms for evolution – all we need to do is actually run our program. This program was ran in the Unity 3D game engine, and I tested several different configurations. I ran each configuration for 600 generations, and tracked their average performance on these charts. It’s not important to know what all the different configurations were – the two I would like to draw your attention to are these two. The first was my best performing model, and the second is the “control” model – it is exactly the same, except that it only included the reinforcement learning part without the genetic algorithm.

The main reason I tested without the genetic algorithm was to see whether the genetic algorithm actually improved the performance. It was possible that having a single, fixed body shape would actually make it easier for the “brain” to learn, and that this might actually be better. Fortunately, this was not the case – the model with the genetic algorithm strongly outperformed the model without.

Now, let’s talk about results. The basic, unmutated model looks like this, and the best performing model with mutation looked like this. They not only look very different, but also developed very different movement styles – the unmutated creatures mostly moved by sliding forward on their bellies like caterpillars, while the mutated model spun around in circles like a top.

While this experiment was a success in many ways, it also had a number of flaws. First, neither the creature’s body nor the way they moved resembled any real-world animals. I think this is due to a lack of constraints – real-world animals actually have a lot more limitations on how they move than this virtual creature. For example, real animals tend to avoid spinning motions because they get dizzy, and because we like to look at what we are moving towards. In addition, we like to keep our bodies upright, and our heads relatively still. It’s possible that I would have gotten more lifelike results if I added these constraints.

Second, I originally intended to test these creatures in a variety of situations, to see if they could overcome challenges such as obstacles, gaps, slopes, and even possibly water. Due to time constraints I had to scale back to testing in a pretty simple environment, but I may expand the possibilities in the future – probably after I finish my Emerald AI.

Finally, while this technique is interesting it is definitely not the type of system you would want in a creature based video game, so it didn’t really satisfy that goal. However, working on and doing research for this project has taught me a lot, and I definitely have ideas for handling unique creatures that I think would work much better in a video game environment, which I can talk more about in future videos.

That’s all I have for today. If you liked this video and want to see more AI, programming, or artificial life videos make sure you leave a like and subscribe so you don’t miss more videos like this in the future.  If you want to see more you should definitely check out my other videos, including my previous AI video where I talk about the history of the sport of basketball. I also have nearly 150 articles on the Rempton Games blog, which you can check out at the link in the description down below. And join me next week for the next entry in my Evolution of Pokemon Designs series, where we are all the way to generation 7! Until then, thank you so much for watching, and I’ll see you all next time.

Read more about:

Blogs
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like