Trending
Opinion: How will Project 2025 impact game developers?
The Heritage Foundation's manifesto for the possible next administration could do great harm to many, including large portions of the game development community.
In this reprinted <a href="http://altdevblogaday.com/">#altdevblogaday</a> in-depth piece, Gamer Camp's Alex Darby explains how to set up Visual Studio to look at optimized assembly code generated for simple code snippets.
[In this reprinted #altdevblogaday opinion piece, Gamer Camp's Alex Darby explains how to set up Visual Studio to look at optimized assembly code generated for simple code snippets.] It's that time again where I have managed to find a few spare hours to squoze out an article for the Low Level Curriculum. This is the eighth post in this series, which is not in any way significant except that I like the number 8. As well as being a power of two, it is also the maximum number of unarmed people who can simultaneously get close enough to attack you (according to a martial arts book I once read). This post covers how to set up Visual Studio to allow you to easily look at the optimized assembly code generated for simple code snippets like the ones we deal with in this series. If you wonder why I feel this is worth a post of its own, here's the reason – optimizing compilers are good, and given code with constants as input and no external output (like the snippets I give as examples in this series) the compiler will generally optimize the code away to nothing – which I find makes it pretty hard to look at. This should prove immensely useful, both to refer back to, and for your own experimentation. Here are the backlinks for preceding articles in the series in case you want to refer back to any of them (warning: the first few are quite long): Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):
Assumptions Strictly speaking, dear reader, I am making tons of assumptions about you as I write this – that you read English, that you like to program etc. but we'll be here all day if I try to list those so let's stick to the ones that might be immediately inconvenient if they were incorrect. I will be assuming that you have access to some sub-species of Visual Studio 2010 on a Windows PC, and that you are familiar with using it to do all the everyday basics like change build configurations, open files, edit, compile, run, and debug C/C++. Creating a project Open Visual Studio and from the menu choose "File -> New -> Project…". Once the new project wizard window opens (see below):
go to the tree view on the left of the window and select "Other Languages -> Visual C++"
in the main pane select "Win32 Console Application Visual C++"
give it a name in the Name edit box
browse for a location of your choosing on your PC
click OK to create the project
Once you have clicked OK just click "Finish" on the next stage of the wizard – in case you're wondering, the options available when you click next don't matter for our purposes (and un-checking the "Precompiled header" check box makes no difference, it still generates a console app that uses a precompiled header…). Changing the Project Properties The next step is to use the menu to select "Project -> <YourProjectName> Properties", which will bring up the properties dialog for the project. When the properties dialog appears (see image below):
select "All Configurations" from the Configuration drop list
select "Configuration Properties ->General" in the tree view at the left of the window
in the main pane change "Whole Program Optimization" to "No Whole Program Optimization".
Next, in the tree view (see image below):
in the tree view, navigate to "C/C++ -> Code Generation"
in the main pane, change "Basic Runtime Checks" to "Default" (i.e. off)
Finally (see image below):
in the tree view, go to "C/C++ -> Output Files"
in the main pane change "Assembler Output" to "Assembly With Source Code /(FAs)"
once you've done that click "OK"
Now, when you compile the Visual Studio compiler will generate an .asm file as well as an .exe file. This file will contain the intermediate assembly code generated by the compiler, with the source code inserted into it inline as comments. You could alternatively choose the "Assembly, Machine Code and Source (/FAcs)" option if you like – this will generate a .cod file that contains the machine code as well as the asm and source. I prefer the regular .asm because it's less visually noisy and the assembler mnemonics are all aligned on the same column, so that's what I'll assume you're using if you're following the article, but the .cod file is fine. So, what did we do there? Well, first we turned off link time code generation. Amongst other things, this will prevent the linker stripping the .asm generated for functions that are compiled but not called anywhere. Secondly, we turned off the basic runtime checks (which are already off in Release). These checks make the function prologues and epilogues generated do significant amounts of (basically unnecessary) extra work causing a worst case 5x slowdown (see this post by Bruce Dawson on his personal blog for an in depth explanation). Finally, we asked the compiler not to throw away the assembly code it generates for our program; this data is produced by the compilation process whenever you compile but is usually thrown away, we're just asking Visual Studio to write it into an .asm file so we can take a look at it. Since we made these changes for "All Configurations" this means we will have access to .asm files containing the assembly code generated by both the Debug and Release build configurations. Let's try it out So in the spirit of discovery, let's try it out (for the sake of familiarity) with a language feature we looked at last time – the conditional operator:
#include "stdafx.h" int ConditionalTest( bool bFlag, int iOnTrue, int iOnFalse ) { return ( bFlag ? iOnTrue : iOnFalse ); } int main(int argc, char* argv[]) { int a = 1, b = 2; bool bFlag = false; int c = ConditionalTest( bFlag, a, b ); return 0; }
The question you have in your head at this moment should be "why have we put the code into a function?". Rest assured that this will become apparent soon enough. Now we have to build the code and look in the .asm files generated to see what the compiler has been up to… First build the Debug build configuration – this should already be selected in the solution configuration drop-down (at the top of your Visual Studio window unless you've moved it). Next build the Release configuration. Now we need to open the .asm files. Unless you have messed with project settings that I didn't tell you to these will be in the following paths:
<path where you put the project>/Debug/<projectName>.asm
<path where you put the project>/Release/<projectName>.asm
.asm files I'm not going to go into any significant detail about how .asm files are laid out here, if you want to find out more here's a link to the Microsoft documentation for their assembler. The main thing you should note is that we can find the C/C++ functions in the .asm file by looking for their names; and that – once we find them – the mixture of source code and assembly code looks basically the same as it does in the disassembly view of Visual Studio in the debugger. main() Let's look at main() first. This is where I explain why the code snippet we wanted to look at was put in a function. I can tell you're excited. Here's main() from the Debug .asm (I've reformatted it slightly to make it take up less vertical space):
_TEXT SEGMENT _c$ = -16 ; size = 4 _bFlag$ = -9 ; size = 1 _b$ = -8 ; size = 4 _a$ = -4 ; size = 4 _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4 _main PROC ; COMDAT ; 9 : { push ebp mov ebp, esp sub esp, 80 ; 00000050H push ebx push esi push edi ; 10 : int a = 1, b = 2; mov DWORD PTR _a$[ebp], 1 mov DWORD PTR _b$[ebp], 2 ; 11 : bool bFlag = false; mov BYTE PTR _bFlag$[ebp], 0 ; 12 : int c = ConditionalTest( bFlag, a, b ); mov eax, DWORD PTR _b$[ebp] push eax mov ecx, DWORD PTR _a$[ebp] push ecx movzx edx, BYTE PTR _bFlag$[ebp] push edx call ?ConditionalTest@@YAH_NHH@Z ; ConditionalTest add esp, 12 ; 0000000cH mov DWORD PTR _c$[ebp], eax ; 13 : return 0; xor eax, eax ; 14 : } pop edi pop esi pop ebx mov esp, ebp pop ebp ret 0 _main ENDP _TEXT ENDS
As long as you've read the previous posts, this should mostly look pretty familiar. It breaks down as follows:
lines 1-8: these lines define the offsets of the various Stack variables from [ebp] within main()'s Stack Frame
lines 10-15: function prologue of main()
lines 17-20: initialize the Stack variables
lines 22-30: push the parameters to ConditionalTest() into the Stack, call it, and assign its return value
line 32: sets up main()'s return value
lines 34-38: function epilogue of main()
line 39: return from main()
Nothing unexpected there really, the only new thing to take in is the declarations of the Stack variable offsets from [ebp]. I feel these tend to make the assembly code easier to follow than the code in the disassembly window in the Visual Studio debugger. And, for comparison, here's main() for the Release .asm:
_TEXT SEGMENT _argc$ = 8 ; size = 4 _argv$ = 12 ; size = 4 _main PROC ; COMDAT ; 10 : int a = 1, b = 2; ; 11 : bool bFlag = false; ; 12 : int c = ConditionalTest( bFlag, a, b ); ; 13 : return 0; xor eax, eax ; 14 : } ret 0 _main ENDP _TEXT ENDS
The astute amongst you will have noticed that the Release assembly code is significantly smaller than the Debug. In fact, it's clearly doing nothing at all other than returning 0. Good optimizing! High five! As I alluded to earlier, the optimizing compiler is great at spotting code that evaluates to a compile time constant and will happily replace any code it can with the equivalent constant. So that's why we put the code snippet in a function It should hopefully be relatively clear by this point why we might have put the code snippet into a function, and then asked the linker not to remove code for functions that aren't called. Even if it can optimize away calls to a function, the compiler can't optimize away the function before link time because some code outside of the object file it exists in might call it. Incidentally, the same effect usually keeps variables defined at global scope from being optimized away before linkage. I'm going to call this Schrödinger linkage (catchy, right?). If we want our simple code snippet to stay around after optimizing we only need to make sure that it takes advantage of Schrödinger linkage to cheat the optimizer. If the compiler can't tell whether the function will be called, then it certainly can't tell what the values of its parameters will be during one of these potential calls, or what its return value might be used for and so it can't optimize away any code that relies on those inputs or contributes to the output either. The upshot of this is that if we put our code snippet in a function, make sure that it uses the function parameters as inputs, and that its output is returned from the function then it should survive optimization. It's really a testament to all the compiler programmers over the years that it takes so much effort to get at the optimized assembly code generated by a simple code snippet – compiler programmers we salute you! ConditionalTest() So, here's the Debug .asm for ConditionalTest() (ignoring the prologue / epilogue):
; 5 : return( bFlag ? iOnTrue : iOnFalse ); movzx eax, BYTE PTR _bFlag$[ebp] test eax, eax je SHORT $LN3@Conditiona mov ecx, DWORD PTR _iOnTrue$[ebp] mov DWORD PTR tv66[ebp], ecx jmp SHORT $LN4@Conditiona $LN3@Conditiona: mov edx, DWORD PTR _iOnFalse$[ebp] mov DWORD PTR tv66[ebp], edx $LN4@Conditiona: mov eax, DWORD PTR tv66[ebp] ; 6 : }
As you should be able to see, this is doing the basically same thing as the code we looked at in the Debug disassembly in the previous article:
branching based on the result of testing the value of bFlag (the mnemonic test does a bitwise logical AND)
both branches set a Stack variable at an offset of tv66 from [ebp]
and both branches then execute the last line which copies the content of that address into eax
Again, the assembly code is arguably easier to follow than the corresponding disassembly because the jmp mnemonic jumps to labels visibly defined in the code, whereas in the disassembly view in Visual Studio you generally have to cross reference the operand to jmp with the memory addresses in the disassembly view to see where it's jumping to… Let's compare this with the Release assembler (again not showing the function prologue or epilogue):
; 5 : return( bFlag ? iOnTrue : iOnFalse ); cmp BYTE PTR _bFlag$[ebp], 0 mov eax, DWORD PTR _iOnTrue$[ebp] jne SHORT $LN4@Conditiona mov eax, DWORD PTR _iOnFalse$[ebp] $LN4@Conditiona: ; 6 : }
You will note that the work of this function is now done in 4 instructions as opposed to 9 in the Debug:
it compares the value of bFlag against 0
unconditionally moves the value of iOnTrue into eax
if the value of bFlag was not equal to 0 (i.e. it was true) it jumps past the next instruction…
…otherwise this moves the value of iOnFalse into eax
As I've stated before I'm not an assembly code programmer and I'm not an optimization expert. Consequently, I'm not going to offer my opinion on the significance of the ordering of the instructions in this Release assembly code. I am, however, prepared to go out on a limb and say it's a pretty safe bet that the Release version with 4 instructions is going to execute significantly faster than the Debug version with 9. So, why such a big difference between Debug and Release for something that when debugging at source level is a single-step? Essentially this is because the unoptimized assembly code generated by the compiler must be amenable to single-step debugging at the source level:
it almost always does the exact logical equivalent of what the high level code asked it to do and, specifically, in the same order
it also has to frequently write values from CPU registers back into memory so that the debugger can show them updating
Summary What's the main point I'd like you to take away from this article? Optimizing compilers are feisty! You have to know how to stop them optimizing away your isolated C/C++ code snippets if you want to easily be able to see the optimized assembly code they generate. This article shows a simple boilerplate way to short-circuit the Visual Studio optimizing compiler – mileage will vary on other platforms. There are other strategies to stop the optimizer optimizing away your code, but they basically all come down to utilizing the Schrödinger linkage effect; in general:
use global variables, function parameters, or function call results as inputs to the code
use global variables, function return values, or function call parameters as outputs from the code
if you're not using Visual Studio's compiler you may also need to turn off inlining
A final extreme method I have been told about is to insert nop instructions via inline assembly around / within the code you want to isolate. Note that you should use this approach with caution, as it interferes directly with the optimizer and can easily affect the output to the point where it is no longer representative. Epilogue So, I hope you found this interesting – I certainly expect you will find it useful :) The next article (as promised last time!) is about looping, which is another reason why it seemed like a good time to cover getting at optimized assembly code for simple C/C++ snippets. I will be referring back to this in future articles in situations where looking at the optimized assembly code is particularly relevant. If you're wondering what you should look at first to see how Debug and Release code differ, and want to get practise at beating the optimizer, I'd suggest starting with something straight forward like adding a few numbers together. Lastly, but by no means leastly, thanks to Rich, Ted, and Bruce for their input and proof reading. [This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]
You May Also Like