Sponsored By

Reverse engineering the binary data format for Star Wars: Yoda Stories

An in-depth technical article in which I revisit my childhood by reverse engineering one of my favorite games, the somewhat obscure Star Wars: Yoda Stories.

Zach Barth, Blogger

October 13, 2014

29 Min Read
Game Developer logo in a gray background | Game Developer

Zach Barth is the Creative Director at Zachtronics, the studio behind SpaceChem, Ironclad Tactics, and Infiniminer. If you like obscurely technical things, you'll love our upcoming game Infinifactory!

Background

I don't know why, but I've always gotten a kick out of reverse engineering data files for computer games. Although decompiling a game's code is a challenging task, data files are often much easier to figure out (as they contain lots of highly visible content like text and sprites) and let you mod the game if you're able to figure it out sufficiently.

In 2006, days after the Star Wars: Empire at War demo was released, I published some rudimentary tools for dumping and repacking the game's data files, including a simple mod that let you play as the Empire (which was disabled in the demo). There are barely any traces of it left on the internet, but I managed to score a free Petroglyph t-shirt at the time so I suppose that's something.

Many years before that I owned another Star Wars game called Star Wars: Yoda Stories. It appears to have been fairly obscure and poorly received, but that didn't stop me from playing the crap out of it. As a novice game programmer and die-hard Star Wars fan, I tried very hard to locate the game's resources so that I could make some sort of terrible Star Wars game of my own. Instead, all I managed to find were some sound effects and a small number of sprites that they distributed as icons, as part of a desktop theme.

Fast forward sixteen years to me looking through my ancient CD collection for some old games to play at a 1990's computer themed party. After popping in the CD I immediately spot what is clearly a data file, roughly four megabytes in size, just waiting for me to apply my overpriced college degree and crack it open. Better late than never!

File Structure (Difficulty: Padawan)

I suppose that the only program you need to reverse engineer something like this is a hex editor, although as we'll see later a decompiler, calculator, and strong working knowledge of the target program help as well. I'm a big fan of HxD, so we're going to use that.

If you want to play along at home, here's a link to the game's data file: YODESK.DTA

Time to open this file!

Well, that's definitely not text, or really anything remotely meant to be read by a human, but that's not exactly surprising either. I'm sure that sixteen years ago I opened this very same file in Notepad and quickly closed it, not remotely understanding what I was looking at.

Right off the bat we can see some four-letter ASCII symbols, which look to me like section identifiers. Scrolling further ahead seems to confirm this: SNDS, TILE, CHAR, PUZ2, and many more. The file ends with an ENDF section, which implies that the overall file structure is some kind of hierarchy or list of tagged sections.

The VERS identifier clearly starts a "version" section, which contains the following four bytes: 0x00, 0x02, 0x00, 0x00. My guess is that this is version 2.0 of the file format, as Yoda Stories was actually the successor to an Indiana Jones game that appears to use the same engine. It doesn't matter much, though, as this isn't a very interesting piece of data.

Next up is the STUP (setup?) section, which contains a lot of mysterious data:

There's clearly some kind of pattern here, but even with a thorough knowledge of the game it's not clear what it's for. The bigger question on my mind is: how do we skip it? Although it'd be possible to just assume it's a fixed length section and skip the data, that's probably not the case.

If we look back at the start of the section (the previous screenshot) we'll see that four suspicious bytes follow the STUP identifier: 0x00, 0x44, 0x01, 0x00. If we measure the rest of the data in the section after these four bytes, we'll find that it's exactly 0x00014400 bytes long. A coincidence? I think not!

These four bytes are clearly an unsigned, 32-bit integer that specifies the amount of data that make up the rest of the STUP section. If it looks like the bytes are backward, though, it's because they are: they're stored in "little-endian" order, where the less-significant bytes are stored first, a common convention for x86 and x86-64 processors. If we read this length value, we can then skip the rest of the section despite knowing nothing about the data that is stored within.

Manually reading through binary files, even one as small as 4MB, isn't a very productive way to make progress, so this is a good time to start writing a program that parses the file and reads and/or dumps out data as we figure out how it's encoded. My preferred programming language is C#, so I'm going to use that; assuming the file format isn't totally screwy, I should be able to get a lot of mileage about of the BinaryReader class and get a quick start. Here's the program for what we've figured out so far:

static void Main(string[] args)
{
    using (BinaryReader binaryReader = new BinaryReader(File.OpenRead("YODESK.DTA")))
    {
        bool keepReading = true;
        while (keepReading)
        {
            string section = new string(binaryReader.ReadChars(4));
            switch (section)
            {
                case "VERS":
                    uint version = binaryReader.ReadUInt32();
                    break;
                case "STUP":
                    uint length = binaryReader.ReadUInt32();
                    byte[] data = binaryReader.ReadBytes((int)length);
                    break;
                default:
                    throw new Exception("Unknown section: " + section);
            }
        }
    }
}

Unfortunately, this only works for the first two sections: as soon as we hit the third section, SNDS, it becomes clear that we need to handle all the cases that will be thrown at us. This ends up being is a pretty common aspect of reverse engineering the file format, as there are many instances of values that are one of many types that require us to understand each possible type that could be encountered. Fortunately, almost all of the sections in the file have a 32-bit unsigned length following the section identifier, which means we can reuse the code from the STUP section to skip over them.

static void Main(string[] args)
{
    using (BinaryReader binaryReader = new BinaryReader(File.OpenRead("YODESK.DTA")))
    {
        bool keepReading = true;
        while (keepReading)
        {
            string section = new string(binaryReader.ReadChars(4));
            switch (section)
            {
                case "VERS":
                    uint version = binaryReader.ReadUInt32();
                    break;
                case "STUP":
                case "SNDS":
                case "ZONE":
                case "TILE":
                case "PUZ2":
                case "CHAR":
                case "CHWP":
                case "CAUX":
                case "TNAM":
                    uint sectionLength = binaryReader.ReadUInt32();
                    byte[] sectionData = binaryReader.ReadBytes((int)sectionLength);
                    break;
                case "ENDF":
                    keepReading = false;
                    break;
            }
        }
    }
}

The ZONE section looks like it specifies a length immediately after (0x00010292), but if you jump forward that many bytes you'll end up in the middle of nowhere, which means we're probably interpreting things incorrectly. However, there are clearly more identifiers shortly after: IZON, IZAX, IZX2, IZX3, IZX4, and zero to many IACTs after that. This set of identifiers repeats again after that, and ends up repeating many times. This is where a bit of knowledge about the game comes in handy: one of the big "features" of Yoda Stories is that every time you play it generates a random map, but after playing a few times it's apparent that these maps are actually a set of fixed smaller maps attached together. The fact that the IACT sections contain text that look like scripts further reinforces the idea that each IZON actually corresponds to a sub-map:

If we can't trivially skip the ZONE section, it means we'll have to figure out more about its structure so that we can parse it and continue parsing the rest of the document. Although I eventually figured it out, this part was much trickier than the earlier sections, as there's data mixed in near the section identifiers which doesn't actually have anything to do with length.

There are many IZON sections, so there's clearly something in the data that helps get from one to the next; although it's possible that the program merely scans forward for the next "IZON" string, that'd be pretty sketchy and unreliable: who knows when you'd want to include "IZON" in an NPC's dialogue line! Measuring the length from the beginning of the first IZON to the next IZON yields a length of 0x064C, which is suspiciously close to the four bytes after the four bytes that followed the ZONE identifier (0x00000646). If we treat those four bytes as a length identifier and look at the 0x646 bytes that follow it, we can start to see a pattern emerge:

4 bytes: "ZONE"
2 bytes: ???? 
2 bytes: ????
4 bytes: length (X)
X bytes: zone data
2 bytes: ????
4 bytes: length (X)
X bytes: zone data
2 bytes: ????
4 bytes: length (X)
X bytes: zone data
...
2 bytes: ????
4 bytes: length (X)
X bytes: zone data
4 bytes: "PUZ2" (start of next section)

More investigation reveals more facts, some helpful, some less so:

  • The two bytes following the ZONE identifier correspond to an unsigned, 16-bit integer that specifies the number of zones.

  • The two bytes before each zone length specifier only ever have values 0x0001, 0x0002, 0x0003, and 0x0005.

  • The first two bytes of the zone data are some sort of ID number for the map, starting at 0x0000 and increasing by one each time.

If we update our program to read the ZONE section using our newly discovered information, we can now read through the entire file.

static void Main(string[] args)
{
    using (BinaryReader binaryReader = new BinaryReader(File.OpenRead("YODESK.DTA")))
    {
        bool keepReading = true;
        while (keepReading)
        {
            string section = new string(binaryReader.ReadChars(4));
            switch (section)
            {
                case "VERS":
                    uint version = binaryReader.ReadUInt32();
                    break;
                case "STUP":
                case "SNDS":
                case "TILE":
                case "PUZ2":
                case "CHAR":
                case "CHWP":
                case "CAUX":
                case "TNAM":
                    uint sectionLength = binaryReader.ReadUInt32();
                    byte[] sectionData = binaryReader.ReadBytes((int)sectionLength);
                    break;
                case "ZONE":
                    ushort count = binaryReader.ReadUInt16();
                    for (int i = 0; i < count; i++)
                    {
                        ushort unknown = binaryReader.ReadUInt16();
                        uint zoneLength = binaryReader.ReadUInt32();
                        byte[] zoneData = binaryReader.ReadBytes((int)zoneLength);
                    }
                    break;
                case "ENDF":
                    keepReading = false;
                    break;
                default:
                    throw new Exception("Unknown section: " + section);
            }
        }
    }
}

With this code we are now able to read through the entire file, without hitting any exceptions that would indicate we've made a bad assumption about the overall structure of the file format. Although we're not extracting anything useful yet, we're absolutely on our way.

Tiles (Difficulty: Jedi Knight)

Now that we're parsing the overall structure of the file, it's time to dig in and start dumping what we came here for: the game's gloriously weird but exhaustive tile set.

If we scroll forward into to the middle of the TILE section, it becomes immediately apparent that there's something going on here: bitmap images! The game's images are 32x32 pixel sprites, so if a 32-byte column width starts showing patterns like these there's a good chance that this is a non-compressed bitmap image format with one byte per pixel.

If we jump back to the start of the TILE section it's easy to spot a 1024-byte (32x32) run of data that happens to map into ASCII character space, which is presumably sprite data, prefixed by four bytes. If we look at the range of values for those four bytes over the entire set of data, we can see that they're relatively random but repeat often, and look a lot like an array of bit flags.

4 bytes: flags
1024 bytes: image data
4 bytes: flags
1024 bytes: image data
...
4 bytes: flags
1024 bytes: image data

Maybe the bit flags describe properties of the tiles? Who cares! We're close to extracting images! All we need to do is extend our program to dump out the presumed image data into actual image files and...

case "TILE":
{
    Directory.CreateDirectory(@"Tiles");
    uint tileSectionLength = binaryReader.ReadUInt32();
    for (int i = 0; i < tileSectionLength / 0x404; i++)
    {
        uint unknown = binaryReader.ReadUInt32();
        Bitmap tile = new Bitmap(32, 32);
        for (int j = 0; j < 0x400; j++)
        {
            byte pixelData = binaryReader.ReadByte();
            Color pixelColor = Color.FromArgb(pixelData, pixelData, pixelData);
            tile.SetPixel(j % 32, j / 32, pixelColor);
        }
        
        tile.Save(string.Format(@"Tiles\{0}.png", i));
    }
    break;
}

Victory! Kind of...

The game is obviously in color, so merely emitting the pixel value for each RGB component is not going to cut it; we'd never get anything other than grayscale. Wikipedia says that a common 8-bit true color scheme is 0bRRRGGGBB, so let's try that instead:

case "TILE":
{
    Directory.CreateDirectory(@"Tiles");
    uint tileSectionLength = binaryReader.ReadUInt32();
    for (int i = 0; i < tileSectionLength / 0x404; i++)
    {
        uint unknown = binaryReader.ReadUInt32();
        Bitmap tile = new Bitmap(32, 32);
        for (int j = 0; j < 0x400; j++)
        {
            byte pixelData = binaryReader.ReadByte();
            byte r = (byte)((pixelData & 0xE0) << 0);
            byte g = (byte)((pixelData & 0x1C) << 3);
            byte b = (byte)((pixelData & 0x03) << 6);
            Color pixelColor = Color.FromArgb(r, g, b);
            tile.SetPixel(j % 32, j / 32, pixelColor);
        }
        
        tile.Save(string.Format(@"Tiles\{0}.png", i));
    }
    break;
}

This is clearly also not the right pixel format, so it's time to stop guessing and start analyzing the numbers directly. Let's grab a screenshot from the game and get some examples of what values map to what colors in the final sprite:

Those shades of blue are interesting: they're very close, but have quite a few distinct shades. If we grab the pixel values from an image manipulation program, we get the following values:

0x1F    0x00 / 0x00 / 0x00
0x92    0x0B / 0x53 / 0xFB
0x93    0x00 / 0x00 / 0xFB
0x94    0x00 / 0x00 / 0xCB
0x95    0x00 / 0x00 / 0x9F
0x96    0x00 / 0x00 / 0x6F

I was overly optimistic that there was some kind of bit mapping from the pixel value to the color components, but seeing 0x1F somehow map to black (all zeroes) points strongly to the contrary. If decreasing the pixel value from 0x93 to 0x92 magically adds two bytes worth of information to the color, I think it's time to resign ourselves to the inevitable: there's a color palette somewhere, and we have no clue where it is.

Although it would be possible to grab a bunch of screenshots and automate the process of mapping pixel values, there really ought to be a better way; the data for the palette needs to exist somewhere. If I were programming this I'd probably just store it as an array of colors where the index into the array is the pixel data value. On a whim, let's search the file for one of the colors used in the R2D2 sprite, 0x0B53FB...

Nope, nothing.

BGR is another common pixel format, so maybe 0xFB530B will work?

Nope.

There is one place we haven't looked, though: the game's executable. We store a lot of gameplay information as code constants in our games, so maybe they did it here as well. Searching for 0x0B53FB in the binary yields no results, but when we search for the BGR value...

Sweet Jesus, it's a color palette! The next four bytes are 0xFB000000, which is color 0x93 in BGRA, so there's a good chance we're onto something. If we look for the data in a disassembler, we can glean a little bit more into what's really going on:

It's hard to tell from this photo, but here's our palette data, right where we'd expect to find a compile-time constant array in a decompiled program. Knowing that this is color 0x92, we can work backward to find the start of the palette and copy it into our program so that we can export images with the proper colors:

private static readonly byte[] PaletteData = new byte[] 

    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xFF,0xFF,0x8B,0x00,0xC3,0xCF,0x4B,0x00,
    0x8B,0xA3,0x1B,0x00,0x57,0x77,0x00,0x00,0x8B,0xA3,0x1B,0x00,0xC3,0xCF,0x4B,0x00,
    0xFB,0xFB,0xFB,0x00,0xEB,0xE7,0xE7,0x00,0xDB,0xD3,0xD3,0x00,0xCB,0xC3,0xC3,0x00,
    0xBB,0xB3,0xB3,0x00,0xAB,0xA3,0xA3,0x00,0x9B,0x8F,0x8F,0x00,0x8B,0x7F,0x7F,0x00,
    0x7B,0x6F,0x6F,0x00,0x67,0x5B,0x5B,0x00,0x57,0x4B,0x4B,0x00,0x47,0x3B,0x3B,0x00,
    0x33,0x2B,0x2B,0x00,0x23,0x1B,0x1B,0x00,0x13,0x0F,0x0F,0x00,0x00,0x00,0x00,0x00,
    0x00,0xC7,0x43,0x00,0x00,0xB7,0x43,0x00,0x00,0xAB,0x3F,0x00,0x00,0x9F,0x3F,0x00,
    0x00,0x93,0x3F,0x00,0x00,0x87,0x3B,0x00,0x00,0x7B,0x37,0x00,0x00,0x6F,0x33,0x00,
    0x00,0x63,0x33,0x00,0x00,0x53,0x2B,0x00,0x00,0x47,0x27,0x00,0x00,0x3B,0x23,0x00,
    0x00,0x2F,0x1B,0x00,0x00,0x23,0x13,0x00,0x00,0x17,0x0F,0x00,0x00,0x0B,0x07,0x00,
    0x4B,0x7B,0xBB,0x00,0x43,0x73,0xB3,0x00,0x43,0x6B,0xAB,0x00,0x3B,0x63,0xA3,0x00,
    0x3B,0x63,0x9B,0x00,0x33,0x5B,0x93,0x00,0x33,0x5B,0x8B,0x00,0x2B,0x53,0x83,0x00,
    0x2B,0x4B,0x73,0x00,0x23,0x4B,0x6B,0x00,0x23,0x43,0x5F,0x00,0x1B,0x3B,0x53,0x00,
    0x1B,0x37,0x47,0x00,0x1B,0x33,0x43,0x00,0x13,0x2B,0x3B,0x00,0x0B,0x23,0x2B,0x00,
    0xD7,0xFF,0xFF,0x00,0xBB,0xEF,0xEF,0x00,0xA3,0xDF,0xDF,0x00,0x8B,0xCF,0xCF,0x00,
    0x77,0xC3,0xC3,0x00,0x63,0xB3,0xB3,0x00,0x53,0xA3,0xA3,0x00,0x43,0x93,0x93,0x00,
    0x33,0x87,0x87,0x00,0x27,0x77,0x77,0x00,0x1B,0x67,0x67,0x00,0x13,0x5B,0x5B,0x00,
    0x0B,0x4B,0x4B,0x00,0x07,0x3B,0x3B,0x00,0x00,0x2B,0x2B,0x00,0x00,0x1F,0x1F,0x00,
    0xDB,0xEB,0xFB,0x00,0xD3,0xE3,0xFB,0x00,0xC3,0xDB,0xFB,0x00,0xBB,0xD3,0xFB,0x00,
    0xB3,0xCB,0xFB,0x00,0xA3,0xC3,0xFB,0x00,0x9B,0xBB,0xFB,0x00,0x8F,0xB7,0xFB,0x00,
    0x83,0xB3,0xF7,0x00,0x73,0xA7,0xFB,0x00,0x63,0x9B,0xFB,0x00,0x5B,0x93,0xF3,0x00,
    0x5B,0x8B,0xEB,0x00,0x53,0x8B,0xDB,0x00,0x53,0x83,0xD3,0x00,0x4B,0x7B,0xCB,0x00,
    0x9B,0xC7,0xFF,0x00,0x8F,0xB7,0xF7,0x00,0x87,0xB3,0xEF,0x00,0x7F,0xA7,0xF3,0x00,
    0x73,0x9F,0xEF,0x00,0x53,0x83,0xCF,0x00,0x3B,0x6B,0xB3,0x00,0x2F,0x5B,0xA3,0x00,
    0x23,0x4F,0x93,0x00,0x1B,0x43,0x83,0x00,0x13,0x3B,0x77,0x00,0x0B,0x2F,0x67,0x00,
    0x07,0x27,0x57,0x00,0x00,0x1B,0x47,0x00,0x00,0x13,0x37,0x00,0x00,0x0F,0x2B,0x00,
    0xFB,0xFB,0xE7,0x00,0xF3,0xF3,0xD3,0x00,0xEB,0xE7,0xC7,0x00,0xE3,0xDF,0xB7,0x00,
    0xDB,0xD7,0xA7,0x00,0xD3,0xCF,0x97,0x00,0xCB,0xC7,0x8B,0x00,0xC3,0xBB,0x7F,0x00,
    0xBB,0xB3,0x73,0x00,0xAF,0xA7,0x63,0x00,0x9B,0x93,0x47,0x00,0x87,0x7B,0x33,0x00,
    0x6F,0x67,0x1F,0x00,0x5B,0x53,0x0F,0x00,0x47,0x43,0x00,0x00,0x37,0x33,0x00,0x00,
    0xFF,0xF7,0xF7,0x00,0xEF,0xDF,0xDF,0x00,0xDF,0xC7,0xC7,0x00,0xCF,0xB3,0xB3,0x00,
    0xBF,0x9F,0x9F,0x00,0xB3,0x8B,0x8B,0x00,0xA3,0x7B,0x7B,0x00,0x93,0x6B,0x6B,0x00,
    0x83,0x57,0x57,0x00,0x73,0x4B,0x4B,0x00,0x67,0x3B,0x3B,0x00,0x57,0x2F,0x2F,0x00,
    0x47,0x27,0x27,0x00,0x37,0x1B,0x1B,0x00,0x27,0x13,0x13,0x00,0x1B,0x0B,0x0B,0x00,
    0xF7,0xB3,0x37,0x00,0xE7,0x93,0x07,0x00,0xFB,0x53,0x0B,0x00,0xFB,0x00,0x00,0x00,
    0xCB,0x00,0x00,0x00,0x9F,0x00,0x00,0x00,0x6F,0x00,0x00,0x00,0x43,0x00,0x00,0x00,
    0xBF,0xBB,0xFB,0x00,0x8F,0x8B,0xFB,0x00,0x5F,0x5B,0xFB,0x00,0x93,0xBB,0xFF,0x00,
    0x5F,0x97,0xF7,0x00,0x3B,0x7B,0xEF,0x00,0x23,0x63,0xC3,0x00,0x13,0x53,0xB3,0x00,
    0x00,0x00,0xFF,0x00,0x00,0x00,0xEF,0x00,0x00,0x00,0xE3,0x00,0x00,0x00,0xD3,0x00,
    0x00,0x00,0xC3,0x00,0x00,0x00,0xB7,0x00,0x00,0x00,0xA7,0x00,0x00,0x00,0x9B,0x00,
    0x00,0x00,0x8B,0x00,0x00,0x00,0x7F,0x00,0x00,0x00,0x6F,0x00,0x00,0x00,0x63,0x00,
    0x00,0x00,0x53,0x00,0x00,0x00,0x47,0x00,0x00,0x00,0x37,0x00,0x00,0x00,0x2B,0x00,
    0x00,0xFF,0xFF,0x00,0x00,0xE3,0xF7,0x00,0x00,0xCF,0xF3,0x00,0x00,0xB7,0xEF,0x00,
    0x00,0xA3,0xEB,0x00,0x00,0x8B,0xE7,0x00,0x00,0x77,0xDF,0x00,0x00,0x63,0xDB,0x00,
    0x00,0x4F,0xD7,0x00,0x00,0x3F,0xD3,0x00,0x00,0x2F,0xCF,0x00,0x97,0xFF,0xFF,0x00,
    0x83,0xDF,0xEF,0x00,0x73,0xC3,0xDF,0x00,0x5F,0xA7,0xCF,0x00,0x53,0x8B,0xC3,0x00,
    0x2B,0x2B,0x00,0x00,0x23,0x23,0x00,0x00,0x1B,0x1B,0x00,0x00,0x13,0x13,0x00,0x00,
    0xFF,0x0B,0x00,0x00,0xFF,0x00,0x4B,0x00,0xFF,0x00,0xA3,0x00,0xFF,0x00,0xFF,0x00,
    0x00,0xFF,0x00,0x00,0x00,0x4B,0x00,0x00,0xFF,0xFF,0x00,0x00,0xFF,0x33,0x2F,0x00,
    0x00,0x00,0xFF,0x00,0x00,0x1F,0x97,0x00,0xDF,0x00,0xFF,0x00,0x73,0x00,0x77,0x00,
    0x6B,0x7B,0xC3,0x00,0x57,0x57,0xAB,0x00,0x57,0x47,0x93,0x00,0x53,0x37,0x7F,0x00,
    0x4F,0x27,0x67,0x00,0x47,0x1B,0x4F,0x00,0x3B,0x13,0x3B,0x00,0x27,0x77,0x77,0x00,
    0x23,0x73,0x73,0x00,0x1F,0x6F,0x6F,0x00,0x1B,0x6B,0x6B,0x00,0x1B,0x67,0x67,0x00,
    0x1B,0x6B,0x6B,0x00,0x1F,0x6F,0x6F,0x00,0x23,0x73,0x73,0x00,0x27,0x77,0x77,0x00,
    0xFF,0xFF,0xEF,0x00,0xF7,0xF7,0xDB,0x00,0xF3,0xEF,0xCB,0x00,0xEF,0xEB,0xBB,0x00,
    0xF3,0xEF,0xCB,0x00,0xE7,0x93,0x07,0x00,0xE7,0x97,0x0F,0x00,0xEB,0x9F,0x17,0x00,
    0xEF,0xA3,0x23,0x00,0xF3,0xAB,0x2B,0x00,0xF7,0xB3,0x37,0x00,0xEF,0xA7,0x27,0x00,
    0xEB,0x9F,0x1B,0x00,0xE7,0x97,0x0F,0x00,0x0B,0xCB,0xFB,0x00,0x0B,0xA3,0xFB,0x00,
    0x0B,0x73,0xFB,0x00,0x0B,0x4B,0xFB,0x00,0x0B,0x23,0xFB,0x00,0x0B,0x73,0xFB,0x00,
    0x00,0x13,0x93,0x00,0x00,0x0B,0xD3,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xFF,0xFF,0xFF,0x00,
};

case "TILE":
{
    Directory.CreateDirectory(@"Tiles");
    uint tileSectionLength = binaryReader.ReadUInt32();
    for (int i = 0; i < tileSectionLength / 0x404; i++)
    {
        uint unknown = binaryReader.ReadUInt32();
        Bitmap tile = new Bitmap(32, 32);
        for (int j = 0; j < 0x400; j++)
        {
            byte pixelData = binaryReader.ReadByte();
            byte r = PaletteData[pixelData * 4 + 2];
            byte g = PaletteData[pixelData * 4 + 1];
            byte b = PaletteData[pixelData * 4 + 0];
            Color pixelColor = pixelData == 0 ? Color.Transparent : Color.FromArgb(r, g, b);
            tile.SetPixel(j % 32, j / 32, pixelColor);
        }
        
        tile.Save(string.Format(@"Tiles\{0}.png", i));
    }
    break;
}

It worked! Over 2000 weird but adorable sprites have been extracted. It's worth pointing out that color 0x00 is actually transparent, something that's obvious from merely looking at the raw pixel data.

Although it's far less interesting than dumping the images, it's not hard to decode the flags if we export them as part of the image filename and look at them all in aggregate. A bit of deduction and a lot of staring at tiny pictures and their corresponding numbers reveals the following:

TILE TYPES:
bit0 = game object
bit1 = non-colliding, draws behind the player
bit2 = colliding, middle layer
bit3 = push/pull block
bit4 = non-colliding, draws above the player (for tall objects that the player can walk behind)
bit5 = mini map tile
bit6 = weapon
bit7 = item
bit8 = character

FLAGS FOR WEAPONS:
bit16 = light blaster
bit17 = heavy blaster OR thermal detonator (????)
bit18 = lightsaber
bit19 = the force

FLAGS FOR ITEMS:
bit16 = keycard
bit17 = puzzle item
bit18 = puzzle item (rare???)
bit19 = puzzle item (key item???)
bit20 = locator
bit22 = health pack

FLAGS FOR CHARACTERS:
bit16 = player
bit17 = enemy
bit18 = friendly

FLAGS FOR OTHER TILES:
bit16 = door

FLAGS FOR MINI-MAP TILES:
bit17 = specific mini map tile (home)
bit18 = specific mini map tile (puzzle, unsolved)
bit19 = specific mini map tile (puzzle solved) 
bit20 = specific mini map tile (gateway, unsolved)
bit21 = specific mini map tile (gateway, solved)
bit22 = specific mini map tile (up wall, locked)
bit23 = specific mini map tile (down wall, locked)
bit24 = specific mini map tile (left wall, locked)
bit25 = specific mini map tile (right wall, locked)
bit26 = specific mini map tile (up wall, unlocked)
bit27 = specific mini map tile (down wall, unlocked)
bit28 = specific mini map tile (left wall, unlocked)
bit29 = specific mini map tile (right wall, unlocked)
bit30 = specific mini map tile (objective)

Notably, bits 16 and above are reused with different meanings depending on the values of bits 0 through 8, which is a little confusing at first if you're expecting a particular bit to have only one meaning.

Maps (Difficulty: Jedi Knight)

Although at this point we've technically gotten what we came here for, the zone maps seem like an interesting thing to figure out as well. If we look back at the zone representation in the hex editor, we can see that there are a number of sub-sections for zone entries, all of which we'll need to learn more about in order to parse zones rather than skipping over them:

Although it's hard to see without loading up the file yourself in a hex editor, the lengths of the IZON sub-sections are all over the place. The first four bytes after IZON look like they might be the length, but if we follow that it'll actually throw us somewhere either short of or beyond the IZAX sub-section identifier:

A lot of times this process feels like playing a stubborn puzzle game (I should know, I make them) but eventually it clicks: there's a little bit of a fixed header that specifies, among other things, the size of the map (either 9x9 or 18x18), followed by 6 bytes per grid square, followed by a 16-bit integer that specifies the number of 12 byte structs that follow.

The other sub-sections are equally cryptic: IZAX specifies its length plus six, as do IZX2 and IZX3, while IZX4 has a fixed length and IACT, which can appear many times, directly specifies its length. Put that all together and we get the following:

4 bytes: "IZON"
4 bytes: unknown
2 bytes: map width (W)
2 bytes: map height (H)
1 byte: map flags (unknown meanings)
5 bytes: unused (same values for every map)
1 byte: planet (0x01 = desert, 0x02 = snow, 0x03 = forest, 0x05 = swamp) 
1 byte: unused (same values for every map)
W*H*6 bytes: map data
2 bytes: object info entry count (X)
X*12 bytes: object info data
4 bytes: "IZAX"
2 bytes: length (X)
X-6 bytes: IZAX data
4 bytes: "IZX2"
2 bytes: length (X)
X-6 bytes: IZX2data
4 bytes: "IZX3"
2 bytes: length (X)
X-6 bytes: IZX3 data
4 bytes: "IZX4"
8 bytes: IZX4 data
4 bytes: "IACT"
4 bytes: length (X)
X bytes: action data
4 bytes: "IACT"
4 bytes: length (X)
X bytes: action data
...
4 bytes: "IACT"
4 bytes: length (X)
X bytes: action data

Knowing that there are six bytes per grid square, there's a good chance they correspond to what goes in the grid square when the map is loaded. Simply looking at the first square of the first map (0x00, 0x00, 0xFF, 0xFF, 0x01, 0x00) gives me the strong impression that it's three 16-bit integers, each representing a tile placed in that location (since there are well over 256 tiles). Extending the program to blit together the extracted tiles using this information yields the following:

Maps! It looks like our guesses were spot on. It turns out that there are over 600 maps in the game, each with their own scripting and layout, which makes me appreciate how much effort went into such a simple looking game.

The variable length "object info" at the end of the IZON sub-section is interesting as well, and seems to indicate additional properties about the tiles placed on the map. Looking at the data as a giant binary blob in the hex editor isn't very helpful, but when we tabulate it as 12-byte entries and compare it against a map image some patterns start to emerge:

TYPE  N/A   X     Y     N/A   ARG
09 00 00 00 04 00 04 00 01 00 01 00
09 00 00 00 09 00 09 00 01 00 05 00
09 00 00 00 0F 00 09 00 01 00 09 00
0F 00 00 00 09 00 0D 00 01 00 5D 00
06 00 00 00 0A 00 03 00 01 00 AE 04
06 00 00 00 0A 00 04 00 01 00 AE 04

If we process each entry as six 16-bit integers, the first appears to be a type flag, while the third and fourth appear to be map coordinates. Although the second and fifth end up always having the same values (0x0000 and 0x0001), the last one is all over the place, which makes it look an awful lot like an argument. When we start looking at what kind of tiles the coordinates point to in the exported map images, we can see that every entry of type 0x09 points to a door, where the argument is the ID of the map the door leads to in-game.

Figuring out what all the different entry types correspond to is a bit of a grind, but by adapting our program to pause for interaction every time it discovers a not yet deciphered entry type we can systematically inspect every entry type in the data file. A bit of deduction and guessing yields the following:

TYPE ID    DESCRIPTION                         ARGUMENT
0x00       trigger location    
0x01       spawn location    
0x02       force-related location    
0x03       vehicle to secondary map            map that vehicle takes you to
0x04       vehicle to primary map              map that vehicle takes you to
0x05       object that gives you the locator    
0x06       crate item                          item contained within
0x07       puzzle NPC                          tile ID for relevant character
0x08       crate weapon                        weapon contained within
0x09       door (in)                           map that door connects to
0x0A       door (out)    
0x0B       unused    
0x0C       lock    
0x0D       teleporter    
0x0E       x-wing from dagobah                 map that x-wing takes you to
0x0F       x-wing to dagobah                   map that x-wing takes you to

Scripts (Difficulty: Jedi Master)

After figuring out the tile format and map format, a logical next step is to figure out the scripting language that powers the map-specific gameplay logic of the game. It's full of ASCII, including what look like commands, so surely it can't be that hard... right?

Unfortunately, there's a lot going on here that's relatively difficult to decode. Unlike tiles and maps, which were visually obvious when we figured them out, scripts involve variables and gameplay status that isn't as easily visible or verifiable. If I was extremely dedicated to reverse engineering this game I imagine that it'd be feasible to load up the levels and match the observed gameplay behavior to the data in the scripts, but that's a little bit further than I'm willing to go. Before we wrap this up, however, we really ought to do something with everything we've learned...

Making a Mod

The IZON sub-section for map number 95 starts at offset 0x266512 in the data file. We can skip 20 bytes to get to the map data, and then skip 6*(18*9+12) more bytes to get to location (12, 11). If we then drop the values for tiles 1985 and 1986 into the "middle" value of each grid square using our hex editor and overwrite the game's data file, we get the following:

I wasn't sure why there was a sports car in the game, but sehugg from Hacker News pointed out that it's actually the car from Corvette Summer, a 1978 movie starring Mark Hamill (who played Luke Skywalker). It turns out that one of the writers of Corvette Summer, Hal Barwood, was actually the designer and writer for Yoda Stories! What an Easter Egg!

Conclusion

There's clearly a lot more that could be done here with enough patience; after reverse engineering the majority of the game's data file, it'd be surprisingly simple to rebuild the game from scratch (it's stuck in a shockingly small window) or build tools to export, edit, and repack the game's content in order to create mods. I'm not sure why you'd want to do either of those things, but they're certainly possibilities. At a minimum, I now have access to a totally rad and exhaustive set of weird looking Star Wars sprites, and I hope that you've learned something interesting about reverse engineering data file formats for old computer games!

Read more about:

Featured Blogs

About the Author

Zach Barth

Blogger

Zach Barth runs the independent game studio Zachtronics Industries and worked on SpaceChem as the lead designer and producer.

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like