Sponsored By

C/C++ low level curriculum: Looking at optimized assembly

In this reprinted <a href="http://altdevblogaday.com/">#altdevblogaday</a> in-depth piece, Gamer Camp's Alex Darby explains how to set up Visual Studio to look at optimized assembly code generated for simple code snippets.

Alex Darby, Blogger

May 9, 2012

19 Min Read
Game Developer logo in a gray background | Game Developer

[In this reprinted #altdevblogaday opinion piece, Gamer Camp's Alex Darby explains how to set up Visual Studio to look at optimized assembly code generated for simple code snippets.] It's that time again where I have managed to find a few spare hours to squoze out an article for the Low Level Curriculum. This is the eighth post in this series, which is not in any way significant except that I like the number 8. As well as being a power of two, it is also the maximum number of unarmed people who can simultaneously get close enough to attack you (according to a martial arts book I once read). This post covers how to set up Visual Studio to allow you to easily look at the optimized assembly code generated for simple code snippets like the ones we deal with in this series. If you wonder why I feel this is worth a post of its own, here's the reason – optimizing compilers are good, and given code with constants as input and no external output (like the snippets I give as examples in this series) the compiler will generally optimize the code away to nothing – which I find makes it pretty hard to look at. This should prove immensely useful, both to refer back to, and for your own experimentation. Here are the backlinks for preceding articles in the series in case you want to refer back to any of them (warning: the first few are quite long): Here are the backlinks for preceding articles in the series (warning: it might take you a while, the first few are quite long):

Assumptions Strictly speaking, dear reader, I am making tons of assumptions about you as I write this – that you read English, that you like to program etc. but we'll be here all day if I try to list those so let's stick to the ones that might be immediately inconvenient if they were incorrect. I will be assuming that you have access to some sub-species of Visual Studio 2010 on a Windows PC, and that you are familiar with using it to do all the everyday basics like change build configurations, open files, edit, compile, run, and debug C/C++. Creating a project Open Visual Studio and from the menu choose "File -> New -> Project…". Once the new project wizard window opens (see below):

  • go to the tree view on the left of the window and select "Other Languages -> Visual C++"

  • in the main pane select "Win32 Console Application Visual C++"

  • give it a name in the Name edit box

  • browse for a location of your choosing on your PC

  • click OK to create the project

Once you have clicked OK just click "Finish" on the next stage of the wizard – in case you're wondering, the options available when you click next don't matter for our purposes (and un-checking the "Precompiled header" check box makes no difference, it still generates a console app that uses a precompiled header…). Changing the Project Properties The next step is to use the menu to select "Project -> <YourProjectName> Properties", which will bring up the properties dialog for the project. When the properties dialog appears (see image below):

  • select "All Configurations" from the Configuration drop list

  • select "Configuration Properties ->General" in the tree view at the left of the window

  • in the main pane change "Whole Program Optimization" to "No Whole Program Optimization".

Next, in the tree view (see image below):

  • in the tree view, navigate to "C/C++ -> Code Generation"

  • in the main pane, change "Basic Runtime Checks" to "Default" (i.e. off)

Finally (see image below):

  • in the tree view, go to "C/C++ -> Output Files"

  • in the main pane change "Assembler Output" to "Assembly With Source Code /(FAs)"

  • once you've done that click "OK"

Now, when you compile the Visual Studio compiler will generate an .asm file as well as an .exe file. This file will contain the intermediate assembly code generated by the compiler, with the source code inserted into it inline as comments. You could alternatively choose the "Assembly, Machine Code and Source (/FAcs)" option if you like – this will generate a .cod file that contains the machine code as well as the asm and source. I prefer the regular .asm because it's less visually noisy and the assembler mnemonics are all aligned on the same column, so that's what I'll assume you're using if you're following the article, but the .cod file is fine. So, what did we do there? Well, first we turned off link time code generation. Amongst other things, this will prevent the linker stripping the .asm generated for functions that are compiled but not called anywhere. Secondly, we turned off the basic runtime checks (which are already off in Release). These checks make the function prologues and epilogues generated do significant amounts of (basically unnecessary) extra work causing a worst case 5x slowdown (see this post by Bruce Dawson on his personal blog for an in depth explanation). Finally, we asked the compiler not to throw away the assembly code it generates for our program; this data is produced by the compilation process whenever you compile but is usually thrown away, we're just asking Visual Studio to write it into an .asm file so we can take a look at it. Since we made these changes for "All Configurations" this means we will have access to .asm files containing the assembly code generated by both the Debug and Release build configurations. Let's try it out So in the spirit of discovery, let's try it out (for the sake of familiarity) with a language feature we looked at last time – the conditional operator:

#include "stdafx.h"

int ConditionalTest( bool bFlag, int iOnTrue, int iOnFalse )
{
    return ( bFlag ? iOnTrue : iOnFalse );
}

int main(int argc, char* argv[])
{
    int a = 1, b = 2;
    bool bFlag = false;
    int c = ConditionalTest( bFlag, a, b );
    return 0;
}

The question you have in your head at this moment should be "why have we put the code into a function?". Rest assured that this will become apparent soon enough. Now we have to build the code and look in the .asm files generated to see what the compiler has been up to… First build the Debug build configuration – this should already be selected in the solution configuration drop-down (at the top of your Visual Studio window unless you've moved it). Next build the Release configuration. Now we need to open the .asm files. Unless you have messed with project settings that I didn't tell you to these will be in the following paths:

<path where you put the project>/Debug/<projectName>.asm

<path where you put the project>/Release/<projectName>.asm

.asm files I'm not going to go into any significant detail about how .asm files are laid out here, if you want to find out more here's a link to the Microsoft documentation for their assembler. The main thing you should note is that we can find the C/C++ functions in the .asm file by looking for their names; and that – once we find them – the mixture of source code and assembly code looks basically the same as it does in the disassembly view of Visual Studio in the debugger. main() Let's look at main() first. This is where I explain why the code snippet we wanted to look at was put in a function. I can tell you're excited. Here's main() from the Debug .asm (I've reformatted it slightly to make it take up less vertical space):

_TEXT    SEGMENT
_c$ = -16                        ; size = 4
_bFlag$ = -9                        ; size = 1
_b$ = -8                        ; size = 4
_a$ = -4                        ; size = 4
_argc$ = 8                        ; size = 4
_argv$ = 12                        ; size = 4
_main    PROC                        ; COMDAT
; 9    : {
    push    ebp
    mov    ebp, esp
    sub    esp, 80                    ; 00000050H
    push    ebx
    push    esi
    push    edi
; 10   :     int a = 1, b = 2;
    mov    DWORD PTR _a$[ebp], 1
    mov    DWORD PTR _b$[ebp], 2
; 11   :     bool bFlag = false;
    mov    BYTE PTR _bFlag$[ebp], 0
; 12   :     int c = ConditionalTest( bFlag, a, b );
    mov    eax, DWORD PTR _b$[ebp]
    push    eax
    mov    ecx, DWORD PTR _a$[ebp]
    push    ecx
    movzx    edx, BYTE PTR _bFlag$[ebp]
    push    edx
    call    ?ConditionalTest@@YAH_NHH@Z        ; ConditionalTest
    add    esp, 12                    ; 0000000cH
    mov    DWORD PTR _c$[ebp], eax
; 13   :     return 0;
    xor    eax, eax
; 14   : }
    pop    edi
    pop    esi
    pop    ebx
    mov    esp, ebp
    pop    ebp
    ret    0
_main    ENDP
_TEXT    ENDS

As long as you've read the previous posts, this should mostly look pretty familiar. It breaks down as follows:

  • lines 1-8: these lines define the offsets of the various Stack variables from [ebp] within main()'s Stack Frame

  • lines 10-15: function prologue of main()

  • lines 17-20: initialize the Stack variables

  • lines 22-30: push the parameters to ConditionalTest() into the Stack, call it, and assign its return value

  • line 32: sets up main()'s return value

  • lines 34-38: function epilogue of main()

  • line 39: return from main()

Nothing unexpected there really, the only new thing to take in is the declarations of the Stack variable offsets from [ebp]. I feel these tend to make the assembly code easier to follow than the code in the disassembly window in the Visual Studio debugger. And, for comparison, here's main() for the Release .asm:

_TEXT    SEGMENT
_argc$ = 8                        ; size = 4
_argv$ = 12                        ; size = 4
_main    PROC                        ; COMDAT
; 10   :     int a = 1, b = 2;
; 11   :     bool bFlag = false;
; 12   :     int c = ConditionalTest( bFlag, a, b );
; 13   :     return 0;
    xor    eax, eax
; 14   : }
    ret    0
_main    ENDP
_TEXT    ENDS

The astute amongst you will have noticed that the Release assembly code is significantly smaller than the Debug. In fact, it's clearly doing nothing at all other than returning 0. Good optimizing! High five! As I alluded to earlier, the optimizing compiler is great at spotting code that evaluates to a compile time constant and will happily replace any code it can with the equivalent constant. So that's why we put the code snippet in a function It should hopefully be relatively clear by this point why we might have put the code snippet into a function, and then asked the linker not to remove code for functions that aren't called. Even if it can optimize away calls to a function, the compiler can't optimize away the function before link time because some code outside of the object file it exists in might call it. Incidentally, the same effect usually keeps variables defined at global scope from being optimized away before linkage. I'm going to call this Schrödinger linkage (catchy, right?). If we want our simple code snippet to stay around after optimizing we only need to make sure that it takes advantage of Schrödinger linkage to cheat the optimizer. If the compiler can't tell whether the function will be called, then it certainly can't tell what the values of its parameters will be during one of these potential calls, or what its return value might be used for and so it can't optimize away any code that relies on those inputs or contributes to the output either. The upshot of this is that if we put our code snippet in a function, make sure that it uses the function parameters as inputs, and that its output is returned from the function then it should survive optimization. It's really a testament to all the compiler programmers over the years that it takes so much effort to get at the optimized assembly code generated by a simple code snippet – compiler programmers we salute you! ConditionalTest() So, here's the Debug .asm for ConditionalTest() (ignoring the prologue / epilogue):

; 5    :     return( bFlag ? iOnTrue : iOnFalse );
    movzx    eax, BYTE PTR _bFlag$[ebp]
    test    eax, eax
    je    SHORT $LN3@Conditiona
    mov    ecx, DWORD PTR _iOnTrue$[ebp]
    mov    DWORD PTR tv66[ebp], ecx
    jmp    SHORT $LN4@Conditiona
$LN3@Conditiona:
    mov    edx, DWORD PTR _iOnFalse$[ebp]
    mov    DWORD PTR tv66[ebp], edx
$LN4@Conditiona:
    mov    eax, DWORD PTR tv66[ebp]
; 6    : }

As you should be able to see, this is doing the basically same thing as the code we looked at in the Debug disassembly in the previous article:

  • branching based on the result of testing the value of bFlag (the mnemonic test does a bitwise logical AND)

  • both branches set a Stack variable at an offset of tv66 from [ebp]

  • and both branches then execute the last line which copies the content of that address into eax

Again, the assembly code is arguably easier to follow than the corresponding disassembly because the jmp mnemonic jumps to labels visibly defined in the code, whereas in the disassembly view in Visual Studio you generally have to cross reference the operand to jmp with the memory addresses in the disassembly view to see where it's jumping to… Let's compare this with the Release assembler (again not showing the function prologue or epilogue):

; 5    :     return( bFlag ? iOnTrue : iOnFalse );
    cmp    BYTE PTR _bFlag$[ebp], 0
    mov    eax, DWORD PTR _iOnTrue$[ebp]
    jne    SHORT $LN4@Conditiona
    mov    eax, DWORD PTR _iOnFalse$[ebp]
$LN4@Conditiona:
; 6    : }

You will note that the work of this function is now done in 4 instructions as opposed to 9 in the Debug:

  • it compares the value of bFlag against 0

  • unconditionally moves the value of iOnTrue into eax

  • if the value of bFlag was not equal to 0 (i.e. it was true) it jumps past the next instruction…

  • …otherwise this moves the value of iOnFalse into eax

As I've stated before I'm not an assembly code programmer and I'm not an optimization expert. Consequently, I'm not going to offer my opinion on the significance of the ordering of the instructions in this Release assembly code. I am, however, prepared to go out on a limb and say it's a pretty safe bet that the Release version with 4 instructions is going to execute significantly faster than the Debug version with 9. So, why such a big difference between Debug and Release for something that when debugging at source level is a single-step? Essentially this is because the unoptimized assembly code generated by the compiler must be amenable to single-step debugging at the source level:

  • it almost always does the exact logical equivalent of what the high level code asked it to do and, specifically, in the same order

  • it also has to frequently write values from CPU registers back into memory so that the debugger can show them updating

Summary What's the main point I'd like you to take away from this article? Optimizing compilers are feisty! You have to know how to stop them optimizing away your isolated C/C++ code snippets if you want to easily be able to see the optimized assembly code they generate. This article shows a simple boilerplate way to short-circuit the Visual Studio optimizing compiler – mileage will vary on other platforms. There are other strategies to stop the optimizer optimizing away your code, but they basically all come down to utilizing the Schrödinger linkage effect; in general:

  • use global variables, function parameters, or function call results as inputs to the code

  • use global variables, function return values, or function call parameters as outputs from the code

  • if you're not using Visual Studio's compiler you may also need to turn off inlining

A final extreme method I have been told about is to insert nop instructions via inline assembly around / within the code you want to isolate. Note that you should use this approach with caution, as it interferes directly with the optimizer and can easily affect the output to the point where it is no longer representative. Epilogue So, I hope you found this interesting – I certainly expect you will find it useful :) The next article (as promised last time!) is about looping, which is another reason why it seemed like a good time to cover getting at optimized assembly code for simple C/C++ snippets. I will be referring back to this in future articles in situations where looking at the optimized assembly code is particularly relevant. If you're wondering what you should look at first to see how Debug and Release code differ, and want to get practise at beating the optimizer, I'd suggest starting with something straight forward like adding a few numbers together. Lastly, but by no means leastly, thanks to Rich, Ted, and Bruce for their input and proof reading. [This piece was reprinted from #AltDevBlogADay, a shared blog initiative started by @mike_acton devoted to giving game developers of all disciplines a place to motivate each other to write regularly about their personal game development passions.]

About the Author

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like