Sponsored By

Sponsored Feature: Common Performance Issues in Game Programming 2

In the sixth Microsoft-sponsored article on <a href="http://www.gamasutra.com/xna">Gamasutra's XNA-themed microsite</a>, Microsoft lead software design engineer Becky Heineman, an XNA Developer Connect staffer and Interplay co-founder, gives practical tip

June 18, 2008

2 Min Read
Game Developer logo in a gray background | Game Developer

Author: by Staff

In the sixth Microsoft-sponsored article on Gamasutra's XNA-themed microsite, Microsoft lead software design engineer Becky Heineman, an XNA Developer Connect staffer and Interplay co-founder, gives practical tips on avoiding a particular performance-killer when making games. As Heineman points out, "90% of the time is spent in 10% of the code, so make that 10% the fastest code it can be." She describes the "Load-Hit-Store" snag when developing for Xbox 360, and goes on to give examples of how to correct it: "Ask any Xbox 360 performance engineer about Load-Hit-Store and they usually go into a tirade. The sequence of a memory read operation (The Load), the assignment of the value to a register (The Hit), and the actual writing of the value into a register (The Store) is usually hidden away in stages of the pipeline so these operations cause no stalls. However, if the memory location being read was one recently written to by a previous write operation, it can take as many at 40 cycles before the 'Store' operation can complete. "The first instruction writes a 32-bit floating-point value into memory, and the following instruction reads it back. What's interesting is that the load instruction isn't where the stall occurs; it's the "oris" instruction. That instruction can't complete until the "store" into r9 finishes, and it's waiting for the L1 cache to update. "What's going on? The first instruction stores the data and marks the L1 cache as "dirty". It takes about 40 cycles for the data to be written into the L1 cache and become available for the CPU to use. During this window of time, an instruction requests that data from the cache and then "hits" R9 for a "store". Since the last instruction can't execute until the store is complete, you've got a stall." After demonstrating a number of manifestations of Load-Hit-Store, Heineman sums up by explaining that, while not strictly essential, some knowledge about how the hardware works can lead to much smoother software: "It takes only a little discipline to write clean code, but it's also easy to create code that can inadvertently introduce performance bottlenecks. Using Microsoft tools like PIX will help you track down some of these, but the best way to avoid bottlenecks, is to be aware of how they can exist so that they aren't written into the code in the first place. "A good understanding of the underlying hardware is not crucial to modern game programming from a high level. However, with a solid foundation of how CPUs work as well as how they interact with the memory subsystems, programmers can write software that maximizes performance." You can now read the full feature on the subject, with more from Heiseman on how the Xbox 360's PowerPC processor is structured and how that relates to performance issues, as well as describing examples of Load-Hit-Store issues that look like they shouldn't exist.

Read more about:

2008
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like