Sponsored By

Dynamic Game Audio Ambience: Bringing Prototype's New York City to LifeDynamic Game Audio Ambience: Bringing Prototype's New York City to Life

In a fascinating, in-depth audio article, Radical's Morgan explains the detail that went into creating the complex ambient sound for the troubled cityscape in action game Prototype.

Scott Morgan, Blogger

June 4, 2009

15 Min Read
Game Developer logo in a gray background | Game Developer

[In a fascinating, in-depth audio article, Radical's Morgan explains the detail that went into creating the complex ambient sound for the troubled cityscape in action game Prototype.]

Prototype is a third-person, original property developed by Radical Entertainment and published by Activision Blizzard. It is an intense, open-world, free-roaming, action adventure game set in a version of New York City that gradually descends into a virtual hell by way of an outbreak of a deadly viral infection.

The game features action driven by a three-way war among the main player character, Alex Mercer; the military, trying to squash the outbreak; and a growing force of infected people and creatures.

Goals for the Ambience

The direction for ambience in Prototype was to create a living, breathing New York City; a city that felt believable and alive, adaptive and dynamic. Instead of aiming to break down the city by zones alone, the desire was to base the ambience on the objects within the game world and their relative densities, populations and emotional states.

Based on the overall design of the game design, our intention with the ambient audio was not to create a block-by-block recreation of the city, nor was it to represent any specific neighborhood or region with detailed, accurate sound.

Instead, we were after an overall feel of New York and the basic sensation that the city was itself a character that was alive and dynamic, transforming with the player's movements throughout the environment and the progression of the story.

The Manhattan (Recording) Project

Early in pre-production we made the decision to travel to New York City to document ambient sound. Our Sound Designer Cory Hawthorne and myself, equipped with M-Audio MicroTrack 24/96 recorders and custom built headset mics/preamps from Sonic Studios, combed the streets, shorelines and parks of Manhattan recording as much audio as we could over the course of a week and a half.

Our aim was twofold: to document New York audio for reference and to collect useful, high quality audio which we could then use to build our game's ambiences from the ground up. We returned from New York with about 20 hours of raw recordings.

We recorded everything from "quiet" courtyards to the noisy center of Times Square. We recorded from 40 storeys up on a rooftop and underground in the subways.

We recorded Central and Battery Parks as well as the bustling financial district and hectic Canal Street shopping district. We recorded in the rain, during the day and the night.

Although much of this audio did make it into the game, the game's ambiences are complete reconstructions and often include multiple layers of our original recordings with extra sounds added from our libraries.

Surprisingly, some of the most useful recordings from New York were those recorded at the greatest distance from individual people and cars. Rooftops, back allies, parks, and more all proved as useful as the busy street corners and pedestrian-heavy centers, mostly due to the method of implementation -- which I'll describe later.

Although our recording set-ups were stereo, some of the ambiences in the game are actually quadraphonic. We decided that light-weight, low-profile stereo equipment was actually more desirable than any kind of elaborate four-channel mic/recorder setup.

The quad ambiences in the game ended up being amalgamations of two or more separate sets of stereo recordings from the same environments. Although you lose any realistic positioning with this style of recording/playback, it has the advantage of sounding denser, which was often desirable for our game.

Although many of Manhattan's neighborhoods and boroughs have distinct and unique ambiences, what we discovered after several days of recording is Manhattan has a constant drone that underscores everything in the city. You can hear it in the parks, the subways and the busy streets -- it is like a resonant note that plays continuously in the background, 24/7.

Some of our quieter recordings reveal this keynote drone so we primarily used those recordings to form the basic, four-channel building block of Prototype's New York City ambience.

Three Tiers

Theoretically, we decided to divide the ambience into three tiers, or perspectives. The quadraphonic bed track, taken from the more distant perspective recordings of Manhattan, became one of our main "background" ambiences. Others included a Central Park background as well as a rooftop background which procedurally fades in and out based on the position of the listener.

Sitting on top of the background ambiences were the grouped layers, or what we like to call "midground" ambience. Midground ambience is entirely composed of ambient sounds made from groups of objects in the world. Pedestrians, vehicles and infected creatures became the main grouped layers of ambience as they were all central to Prototype's open-world gameplay.

The last tier is what we would call foreground ambience. Foreground ambience is composed of sounds that originate from a single object in the game world. Quite simply, these are individual lines of ambient dialogue, individual vehicle honks, engines, tire skids, etc. or individual creature sounds that play based on the state of the object, determined for the most part by the AI.

The main advantage of this tiered approach is the blending you can achieve from foreground to background which acts to provide a kind of aural depth of field. Because you get individual reactions and ambient sounds form objects in the immediate foreground, the midground and background layers blend in to provide a sense of depth to the audio. This way, the individual sounds don't stand out as awkwardly loud or prominent in the mix because of the blended grouped content underneath.

Another advantage to this approach is you can be frugal with the use of voices for ambient foreground sounds thanks to the support of the midground, grouped sounds. This allowed us to set maximums on individual pedestrian voices, vehicle engines and other foreground objects. In a game which features hundreds of these objects on the screen at any given time, this proved very important to reserve voices for other more important sounds like the main characters powers, combat sounds, prop damage states, etc.

18 Channels

For reasons of disc-streaming efficiency, we decided to create an interleaved, multichannel ambience file that loops in the game and dynamically mixes according to the densities of certain objects in the vicinity of the listener.

The reason for the interleaved, 18-channel file was purely to limit the number of seeks on the disc the system would have to make in order to stream background and midground ambiences simultaneously.

Pedestrians, traffic and infected enemies all travelled in groups, so each of these elements had their own set of layers in the 18-channel file. All foreground ambiences were conventionally preloaded into RAM with the characters and objects they belonged to.

When panic ensues in the world (as it is prone to do in Prototype) the panic layers are turned up depending on the densities of pedestrians in the area. If there are only a few pedestrians, they will only respond individually. If there is a group of 10 or so, the first, low-density midground layer will fade up. If there are even more, a midground crowd layer with increased density will fade giving more of a crowd effect.

The same technology is applied to infected hordes and roughly the same to vehicle traffic. Traffic also has an "idle" state which fades up when the numbers of cars is high enough to warrant it. A traffic panic layer fades up when cars begin to panic.

Running behind all of this is the basic, four-channel city ambience, which cross-fades with a rooftop ambience based on the listener's height, and a Central Park ambience for when inside the park.

Because all streams are running simultaneously, these cross-fades are all position-based, not trigger-based. When standing on the edge of the park and the city the player hears 50% park and 50% city rather than having to cross a trigger volume to trigger a preset cross-fade.

We decided late in production that, for the system to sound really convincing, the midground groups for pedestrian and infected crowds needed to be divided into two sizes.

Because the sizes of the groups vary dramatically in the game from small to really large, and because volume alone does not represent density very well (a crowd sound of 40 people faded to a quarter the amplitude does not sound like a crowd of 10 people), two layers of crowd sounds, one smaller and one larger, proved the way to go.

For reasons of economy, we elected to have the smaller groups be mono, which get positioned in the quad matrix based on the averaged location of the group's members. The larger groups are stereo and get split left and right but still weighted in each quad speaker depending on the counts in that quadrant.

In terms of placement, the code was not only balancing the overall levels of each layer based on densities of objects in the world but also placing the weight on the volume per channel depending on the averaged position of objects.

This adds a sense orientation to the crowds and gives the listener a sense of directionality of the groups. This can be particularly useful for locating infected hordes in the game as they potentially pose a threat to the player character.

Grouped Content Creation: The "Swarm Player"

While the main background layers were composed primarily of custom recordings from New York itself, the midground layers were complete constructions. Because we already had a multitude of individual recordings of actors screaming and panicking, I decided to build a custom "walla generator" patch using Cycling '74's Max/MSP.

The patch is very simple. It takes an input folder of sounds and provides some basic options to control timing of the events with a random gap variation range, the number of channels to distribute to, a pitch range, a volume range and an overall EQ. It also has the ability to add a vst plug-in for reverb or any other desired external effect.

Through editing the timer, the density of the grouped content could then be tuned. The output of the patch was recorded and used directly as one of the layers of the 18-channel ambience, either as a stereo pass for the larger groups or a mono layer for the smaller groups. In total, there were about 400 individual reaction files used as the source for the crowd panic layers.

This same process was used for the infected mobs. In this case, the input files were the same infected creature sounds created by our sound designer, Cory Hawthorne, and then mixed with some additional death screams and panic of some of the actors to provide a sense of a mob gone mad with infection.

Increased reverb and EQ was applied to the distant groups versus the smaller groups as well as the run-time, procedural reverb which was applied during the game to all three tiers of ambience.

Runtime Reverb and Filtering

All the ambience was sent through a procedural reverb system. This is true not only of the midground layers but also of the background. Through a system of ray casting, the physical space of the listener was analyzed in real time, and the reverb parameters set to align with the size of the space that the listener was in.

While entering a tunnel in Central Park, for example, the system detects an enclosed space of a certain size and dynamically sets the reverb parameters. The sound of the park's birds and other ambient sounds is passed through the bigger reverb to give the illusion that the sounds are no longer arriving directly to the listener, but are reflected first, mimicking what would happen in the real world.

Similarly, there is a procedural filter roll-off applied to sounds when the player/listener moves up in the world. When climbing rooftops, the sound from the ground level (crowds, traffic, etc.) are first run through a low pass filter to remove the high frequencies, then cross-faded with the rooftop ambience to give a seamless distance fall-off and transition between the vertical "zones" in the game.

The same system of filtering is applied to groups of pedestrians or infected in the distance. If an infected zone is heavily populated but not immediately close to the listener, the amplitude of the infected group layer may be turned up but filtered to sound populated, but distant.

Tuning and Balancing

Using our proprietary, in-house audio tool AudioBuilder, designers and coders are able to collaborate on building custom interfaces for tuning all aspects of audio in the game. This is done by way of a Graph/Patch interface that works similar to Reactor or Max/MSP.

Our ambience graph/patch was the collaborative effort of one of our audio programmers, Steven Scherrelies, and myself. Steven deserves a lot of credit for his diligence in designing and implementing the code and the UI of the system.

The ambience interface went through many iterations before arriving at the final system used in Prototype. The end result is a system that is heavily catered to Prototype's open-world, free-roaming nature.

Some aspects of the interface are set by the sound designer in real time, such as overall volume, cross-fade curves and max volumes of any individual streams. A variety of parameters are also exposed to allow tuning of roll-off times, smoothing factors, channel leaks, etc. to allow for subtle "massaging" of the resulting sound.

Other parameters in the patch are controlled at runtime by the audio code such as positional weighting of the sounds in the quadraphonic matrix and cross-fade amounts between zones. The audio code takes input from the AI and other game systems to determine how many objects are in a given quadrant of a sphere around the listener and calculate what the volume level should be per channel.

The same raycasting used for the procedural reverb is used for the ambience to determine how big the spheres should be by detecting walls and surfaces. This is included to prevent crowds from being heard through walls or other obstructions.


(Click for full size)

The system is very reactionary. It has no memory or sense of direction. It responds to the input from the AI and other game systems immediately with no discretion. Because of this, value smoothing is crucial to the end result being perceived as transparent and fluid. The smoothing algorithm is essentially that of a low pass filter, the basic parameters of which are exposed in the UI of the patch for tuning purposes.

The overall output of the ambience system is then bussed to our mixer system, which allows overall control of ambience levels within the game's main mix state. This mixer-based control allows the fading of ambience to occur during cinematics, or the filtering of the ambient sound during special game modes like sensory powers.

Conclusions

This system works fairly well with a couple noticeable exceptions: pedestrians don't have an "idle" state. This could have contributed a lot, but we were already pushing the limit with the number of channels and making an appropriate sounding idle crowd layer proved more difficult than expected.

Another issue was crowds were not reactionary enough. Because we resorted to fading the layers, crowds never "burst" with fear, they only grew slowly into fear.

Also, interleaving the ambiences means that all elements of the ambience are linked in time -- so if a car honk occurs two minutes into the file and a scream of a pedestrian three seconds later, this pattern will be repeated every time those channels are both turned up at that stage of the loop.

This leads to predictability, which is never a great attribute with ambience. Lastly, this system does not handle interiors very well so interiors were dealt with by using completely different four-channel ambiences and no object grouping.

Advantages of this system include runtime performance; the interleaved, 18-channel file minimizes disk seeks considerably over running the elements of the ambience as independent streams.

A secondary advantage to the interleaving is that zones can cross-fade procedurally, rather than based on a trigger volume event. This means the player or listener's position in the world can be used to determine the cross-fade amount of, say, the park and the city, rather than a zone boundary which triggers a preset cross-fade between two independent streams.

On the whole, the biggest advantage is the dynamics of the system. As cars, people and infected creatures come and go in the game space, so does the sound. This contributes to an ever-present sense that the game world is alive, fluid and bustling like the true representation of New York City should be.

This system is best-suited to an open-world game which includes high densities of objects and characters and can change dramatically and quickly at any given time.

If we were to tackle the same issues again, I would record custom walla with large groups of actors in an outdoor space and record a wider variety of New York sounds from increased distance of populations of cars and people. This would improve the clarity and division of background and midground ambiences and increase the overall quality of the content. This would also allow increased emotional range of the crowds themselves and a true sense of the city as a character in the game.

Read more about:

Features

About the Author

Scott Morgan

Blogger

Scott Morgan is a Sound Director at Radical Entertainment in Vancouver B.C., Canada. Scott has worked on numerous games as a Sound Designer and has been the Sound Director for Radical's Incredible Hulk: Ultimate Destruction as well as the highly anticipated game Prototype. Scott studied music and communications in the 1990's at Simon Fraser University in Burnaby, B.C. where he focused on computer music composition, film sound and acoustic ecology. Aside form his work on game audio, Scott releases ambient music under the pseudonym, loscil.

Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like