Sponsored By

Debugging ConcurrencyDebugging Concurrency

Paquet, the Core Technology Manager for Driv3r and Stuntman creator Reflections, discusses the tricky and increasingly important issue of debugging code on multi-core processors, commenting that, despite the performance advantages: "Unfortunately, with multithreaded applications come a new class of bugs which challenge even the most experienced programmers."

Philippe Paquet, Blogger

June 6, 2005

12 Min Read
Game Developer logo in a gray background | Game Developer

1. Introduction

In July 2000 Apple brought dual-processing to its PowerMac line and symmetric multiprocessing (SMP) became officially mainstream. Less than 4 years later, Intel introduced a feature called "Hyperthreading" to their IA-32 processor line and simultaneous multithreading (SMT) became mainstream too. Before the end of the year, we will see Intel, AMD and Microsoft introducing multi-core processors to the masses. Multithreaded applications are now a fact of life and probably the only way toward maximum performance. Unfortunately, with multithreaded applications come a new class of bugs which challenge even the most experienced programmers.

As a simplification, I will only talk about threads within a single system and I will limit resources to memory. Keep in mind that any of the conditions described below can occur in distributed systems and can involve any part of the hardware, including but not limited to the GPU and the network connections.

2. Race Condition or Race Hazard

A race condition, sometimes called a race hazard, occurs when data can be accessed by two or more different threads simultaneously and there is no mechanism to control temporal ordering of access. Depending on which thread accesses the shared data first, the behavior of the program change.

In the following example program, the printed value of g_iResult depends on which thread will execute his assignment first:

  • If ThreadOne assigns 50 to g_iResult before ThreadTwo is able to assign 150 to g_iResult, then g_iResult will contain 150 at the time it is printed.

  • If ThreadTwo assigns 150 to g_iResult before ThreadOne is able to assign 50 to g_iResult, then g_iResult will contain 50 at the time it is printed.

If ThreadOne and ThreadTwo were executed separately, we would have a constant result. The unpredictability is caused by the race between the two threads.

//
// Race.cpp
//
// Example of a race condition
//
//

#include <windows.h>
#include <stdio.h>
#include <process.h>

int g_iResult = 100;
bool g_bThreadOneFinished = false;
bool g_bThreadTwoFinished = false;

void ThreadOne(void*)
{
  // Wait some random amount of time
  Sleep(rand());

  // Set the result
  g_iResult = 50;

  // Finished
  g_bThreadOneFinished = true ;
   _endthread();
}

void ThreadTwo(void*)
{
   // Wait some random amount of time
  Sleep(rand());

  // Set the result
  g_iResult = 150;

  // Finished
  g_bThreadTwoFinished = true ;
  _endthread();
}

int main()
{
  // Start the threads
  _beginthread(ThreadOne, 0, NULL);
  _beginthread(ThreadTwo, 0, NULL);

  // Wait for the threads to finish
  while (( false == g_bThreadOneFinished)
  || ( false == g_bThreadTwoFinished))
  {
    Sleep(1);
  }

  // Print the result
  printf("Result: %i\n", g_iResult);
}

At a high level, the most common symptom of a race condition is the non-determinacy of the program behavior. At a lower level, it is the unpredictability of values stored in variables shared between multiple threads.

Race conditions are notoriously hard to debug and test for because they often occur in highly unlikely situations. Symptoms sometimes just disappear when a debugger is attached or when trace messages are added. This phenomenon is known as HeisenBug (from Heisenberg's Uncertainty Principle in quantum physics).

As source-level debuggers usually don't help much (they directly affect program timings), your best bet is to use trace messages. Add trace messages around critical resources usage, log everything and check whether the order of events is what you are expecting.

Architecturally, a commonly used solution is to give one unique thread exclusive access of a “would be shared” resource and have every other thread accessing that “would be shared” resource using inter-thread communication.

3. Deadlock and Livelock

A deadlock is a condition that arises when two or more threads are waiting for circularly locked resources. In multithreaded programs, deadlocks are not unusual as lock mechanisms are a fundamental part of parallel programming.

A livelock is very similar to a deadlock. The difference being that, while in a deadlock two or more threads are stalled waiting for each other, they will not be stalled in a livelock and will constantly be trying to acquire various resources and dynamically lock each other.

Sometimes, a deadlock can occur within the same thread. That condition is called a self deadlock. Even if a self deadlock can present the same symptom as a deadlock, it is actually not a true deadlock.

The following example shows a classic deadlock case. Consider the sequence of actions on the two threads:

• ThreadOne acquires g_hMutexOne
• ThreadTwo acquires g_hMutexTwo
• ThreadOne blocks attempting to acquire g_hMutexTwo
• ThreadTwo blocks attempting to acquire g_hMutexOne

ThreadOne and ThreadTwo end up waiting indefinitely on a distinct mutex that the other thread owns. Unless some way to break this deadlock is put in place, the program will freeze.

//
// Deadlock.cpp
//
// Example of a deadlock
//
//

#include <windows.h>
#include <stdio.h>
#include <process.h>

HANDLE g_hMutexOne;
HANDLE g_hMutexTwo;
Bool g_bThreadOneFinished = false;
bool g_bThreadTwoFinished = false;

void ThreadOne( void *)
{
  // Get first mutex
  printf("ThreadOne ask for g_hMutexOne\n");
  WaitForSingleObject(g_hMutexOne, INFINITE);
  printf("ThreadOne gets g_hMutexOne\n");

  // Wait some time, so the second thread can get the second mutex
  Sleep(100);

  // Try to get the second mutex. We will wait indefinetly here as
  // the second mutex is already owned by ThreadTwo
  printf("ThreadOne ask for g_hMutexTwo\n");
  WaitForSingleObject(g_hMutexTwo, INFINITE);
  printf("ThreadOne gets g_hMutexTwo\n");

  // Release the two mutex
  ReleaseMutex(g_hMutexTwo);
  ReleaseMutex(g_hMutexOne);

  // Finished
  g_bThreadOneFinished = true ;
  _endthread();
}

void ThreadTwo(void*)
{
  // Get the second mutex
  printf("ThreadTwo ask for g_hMutexTwo\n");
  WaitForSingleObject(g_hMutexTwo, INFINITE);
  printf("ThreadTwo gets g_hMutexTwo\n");

  // Wait some time, so the first thread can get the first mutex
  Sleep(100);

  // Try to get the first mutex. We will wait indefinetly here as
  // the first mutex is already owned by ThreadOne
  printf("ThreadTwo ask for g_hMutexOne\n");
  WaitForSingleObject(g_hMutexOne, INFINITE);
  printf("ThreadTwo gets g_hMutexOne\n");

  // Release the two mutex
  ReleaseMutex(g_hMutexOne);
  ReleaseMutex(g_hMutexTwo);

  // Finished
  g_bThreadTwoFinished = true;
  _endthread();
}

int main()
{
  // Create the two mutex
  g_hMutexOne = CreateMutex(NULL, FALSE, NULL);
  g_hMutexTwo = CreateMutex(NULL, FALSE, NULL);

  // Start the threads
  _beginthread(ThreadOne, 0, NULL);
  _beginthread(ThreadTwo, 0, NULL);

  // Wait for the threads to finish
  while (( false == g_bThreadOneFinished)
  || ( false == g_bThreadTwoFinished))
  {
    Sleep(1);
  }

  // Free the two mutex
  CloseHandle(g_hMutexTwo);
  CloseHandle(g_hMutexOne);
}

The most common symptom for a deadlock is a freeze or a crash.

Fortunately, the source of a deadlock is usually easy to find. In order for a deadlock to occur, at least one thread must own a lock and at least one thread must be trying to acquire the lock.

Attaching a source-level debugger and breaking the program will tell you right away which locks and which threads are involved. Knowing which locks and threads are involved may not be enough and stepping through parallel programs is extremely tedious.

Here again, you will have to rely on trace messages. Add trace messages either in or around each lock mechanism and log everything. Then, check whether the locking order is what you are expecting.

Architecturally, there are solutions to deadlock:

  • Only allow threads to own one lock at a time.

  • Force threads to lock all the resources they will need before starting them up

It may not be possible to use any of these solutions every time as they impose quite heavy constraints but it is always worth considering them.

4. Mismatched communication

Mismatched communication occurs when there is a send is not matched with a corresponding receive or a receive is not matched with a corresponding send. It can happen because corresponding behavior were not planned properly or when they are planned properly because the order of the messages is not the expected one. Mismatched communication often escalade to a deadlock.

Mismatched communication happens at different levels: thread, process, CPU, network. It is a problem in any place where communication occurs between different software components, whatever the API or the communication protocol is.

In the following example, we use a very primitive mechanism to pass messages between threads. In addition of the main thread, we have two other threads: ThreadOne and ThreadTwo. The main thread pass to ThreadOne the following message “ThreadOne: OK”. ThreadOne which is waiting for this message pick it up and send to ThreadTwo the following message: “ThreadTwo: OK”. Unfortunately, because ThreadTwo is expecting “ThreadTwo: NOTOK” instead of “ThreadTwo: OK”, ThreadTwo will never pick the message and will wait indefinitely.

//
// Mistmatched.cpp
//
// Show mismatched communication
//
//

#include <windows.h>
#include <stdio.h>
#include <process.h>

HANDLE   g_hMutex;
char      g_achMessage[64];
bool      g_bThreadOneFinished = false ;
bool      g_bThreadTwoFinished = false ;

void ThreadOne( void *)
{
   do
   {
      // Wait some time
      Sleep(1);

      // Get access to the message
      WaitForSingleObject(g_hMutex, INFINITE);

      // If we get an OK message, send an OK message to ThreadTwo
      if (0 == strcmp(g_achMessage, "ThreadOne: OK"))
      {
         printf("ThreadOne received a message\n");
         printf("ThreadOne send a message to ThreadTwo\n");
         strcpy(g_achMessage, "ThreadTwo: OK");
         g_bThreadOneFinished = true ;
      }

      // Free access to the message
      ReleaseMutex(g_hMutex);
   }
   while ( false == g_bThreadOneFinished);

   // Clean up
   _endthread();
}

void ThreadTwo(void*)
{
   do
   {
      // Wait some time
      Sleep(1);

      // Get access to the message
      WaitForSingleObject(g_hMutex, INFINITE);

      // If we get an OK message, finish the thread.
      // Unfortunatly, the message we are waiting for
      // is not the right one
      if (0 == strcmp(g_achMessage, "ThreadTwo: NOTOK"))
      {
         printf("ThreadTwo received a message\n");
         g_bThreadTwoFinished = true ;
      }

      // Free access to the message
      ReleaseMutex(g_hMutex);
   }
   while ( false == g_bThreadTwoFinished);

   // Clean up
   _endthread();
}

int main()
{
   // Initialize the message
   strcpy(g_achMessage, "");

   // Create the mutex
   g_hMutex = CreateMutex(NULL, FALSE, NULL);

   // Start the threads
   _beginthread(ThreadOne, 0, NULL);
   _beginthread(ThreadTwo, 0, NULL);

   // Send a message to ThreadOne
   printf("Main send a message to ThreadOne\n");
   WaitForSingleObject(g_hMutex, INFINITE);
   strcpy(g_achMessage, "ThreadOne: OK");
   ReleaseMutex(g_hMutex);

   // Wait for the threads to finish
   while (( false == g_bThreadOneFinished)
   || ( false == g_bThreadTwoFinished))
   {
   Sleep(1);
   }

   // Free the mutex
   CloseHandle(g_hMutex);
}

The most common symptom for mismatched communication is a freeze.

Once again, attaching a source-level debugger and breaking the program gives you some information but usually not enough to understand the real cause of the problem. Stepping through the program will be extremely tedious and may change communication timings.

Here again, you will have to rely on trace messages. There are several methods you can use to ease your debugging:

The first method is to build in your code facilities to dump all the messages you will be passing between threads, process, etc… Being able to see all the messages will give you a good insight of what is really happening. You will be able to know which messages have been sent, in which order they have been sent and if they have been successfully received.

The second method is to use message queues. By using message queues, you will be able to see very easily what the current state of the system is. Using different kind of queues will help a lot. In a typical message passing API, three kind of queues will be used: pending sends, pending receives, unexpected messages.

5. Best practices

The first and most important thing is to plan debugging from the very beginning. Don't think that you can escape concurrency bugs. Don't wait to see a concurrency bug appearing in your code. You need to plan how you are going to debug concurrency in your application before writing your first line of code.

Use one and only one way to pass messages around. By doing so, you will have only one place to look at when a mismatched communication condition or a deadlock occurs. If a message passing API is available, don't reinvent the wheel and use it.

“One thread to rule them all." Plan your application to run with an n number of threads. This way, you will be able to pass your application through a phase of serial debugging before have to go through the phase of parallel debugging. As it is much easier to debug a serial application, you should choose the path of least resistance and at first get rid of all the bugs found when your application runs as a serial application. When your application is free of “classic” bugs, you will be in a better position to attack concurrency bugs.

_____________________________________________________


Read more about:

Features

About the Author

Philippe Paquet

Blogger

Philippe Paquet is Technical Director for The Collective, Inc. He is an industry veteran who started programming in the early eighties on the Apple II. Before joining The Collective, Inc., Philippe was Core Technology Manager for Reflections Interactive Ltd. and Technical Director for Atari. His most recent titles include Driv3r and Stuntman. He can be reached at [email protected].

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like