Sponsored By

Unity Audio Import Optimisation - getting more BAM for your RAM

A guide to using Unity's audio import settings to help improve game performance.

Zander Hulme, Blogger

January 7, 2019

23 Min Read
Game Developer logo in a gray background | Game Developer

Edit: gave more nuanced advice on ADPCM, added some details that were missing, re-phrased some unclear wording, and fixed a mistake in the Venn diagram that erroneously listed Uncompressed on Disk as being a low-RAM option. Also added link to PDF version of the recommended settings tables.

Unity's audio import settings are not widely understood in their entirety, and at the time of writing, I have been unable to find any comprehensive guides to their use. Unity's documentation does a pretty good job of describing what its audio import settings do, but I would like to break these descriptions down for a wider audience, and give some more detail on how to use these settings to get better performance out of your game.

This document is divided into five parts:

Optimising your Unity audio import settings is one of the easiest wins for optimisation, and depending on the scale of your project, should take you less than an hour to achieve significant improvements in load times, RAM use, and other areas. I hope this guide will be of use to you. Info is current as of Unity version 2018.3

 

1. How audio affects performance

Audio data is big. For many games, sound data can take the lion’s share of disk space (the drive/cartridge/optical disc/etc that the game data is stored on) and RAM (the working memory of the system). As if this weren’t bad enough, it can also take a strain on the CPU - especially if you’re using DSP (runtime audio processing) effects  - and significantly increase load times.

For three of these areas (disk space, RAM use, CPU use) optimisation is a triangular tug of war, like the good-cheap-quick problem. Depending on which area is causing your game the most trouble, you can save efficiency in one area by sacrificing it in another. For example: if your raw audio is taking up too much RAM, you can store it compressed as Vorbis instead – this saves RAM space at the cost of the CPU, since it takes processing power to decode the compressed file when it is accessed. Below is a diagram of different settings and how they affect these three areas:

Compressed_20audio_20resource_20use_20Venn_20diagram_20(fixed).png

Please note: This diagram tells us nothing about the bandwidth of data being accessed from the disk/RAM.

In reality, it’s a little more nuanced than that, but this should give you a general idea of how these issues are interconnected. To understand how to implement these settings (and tackle problems like extended load times), we need to look at each of the audio import settings in detail.

 

2. Understanding import settings

When you select an AudioClip in the Unity editor, you are greeted by this panel in the inspector window:

AudioClip_20inspector_20screenshot.png

Below is a list of the audio settings from top to bottom, and what they do:

Force to Mono

  • Yes: if AudioClip is in stereo (or another number of channels), downmixes all channels down into mono.

  • No: does not modify channel count.

As an audio designer, I never force to mono, because I create sounds for purpose. But if you are using stock assets and want to enable this setting, be careful to check that the mono-summed file doesn't sound thin and weird because of phase interactions between the left and right channels. You can preview the processed sound by pressing the Play button on the bottom-right of the inspector window - if you are getting phasing issues, you may want to pull the sound apart in an audio editor and export just the left or right channel as a mono sound.

 

Normalize (only applicable if Force to Mono is enabled)

  • Yes: readjusts the gain of the AudioClip so that the now-mono sound is the same volume as the original stereo file.

  • No: does not readjust the gain.

It is usually advisable to enable normalization if you're using Force to Mono. A loud stereo file when summed to mono can be even louder, going above maximum amplitude and causing hard digital clipping, which is usually undesirable.

 

Load in Background/Preload Audio Data

These settings have direct effects on each other, so I'm presenting them together

Load in background

Preload audio data

Outcome

Enabled

Enabled

When the scene is loaded, AudioClips with this setting begin loading but do not stall the main thread. If they have not all finished loading by the time the scene has loaded, they will continue to load in the background as the scene plays.

If a sound that hasn't loaded yet is triggered, it will behave the same as if it had Preload disabled (see immediately below).

Enabled

Disabled

When the sound is triggered for the first time, it will load in the background and play as soon as it is ready. If the file is large, this will cause a noticeable delay between triggering and playing, but this is not an issue for subsequent plays of the sound.

Disabled

Enabled

The audio is loaded while the scene is loaded. The scene will not start until all sounds with this setting are loaded into memory.

Disabled

Disabled

When the sound is triggered for the first time, it uses the main thread to load itself into memory - if the file is large, this will cause a frame hitch, but this not an issue for subsequent plays of the sound.

I would only recommend this for very small files, and even then, make sure to measure the impact this has on performance in the profiler, and consider whether a large number of these sounds might possibly be triggered at once, multiplying the hit to performance.

 

Ambisonic

  • Check this box if the clip has ambisonic-encoded audio. Ambisonic audio is useful for VR, AR, and 360 video etc, but it’s not really germane to this guide.

 

Platform-specific settings

  • These tabs let you specify a default setting and platform-specific settings for the settings below. Some platforms have compression formats that aren’t available on other platforms, and some might just have different hardware that requires you to optimise differently. See Notes below for more on platform-specific compression formats.

  • Make sure to check the platform-specific settings, even if you want everything to use the general settings – Unity may set some platform-specific settings for you automatically. For example, iOS builds may default to “specify sample rate: 22kHz”, which can cause aliasing (a sound glitch introduced by downsampling incorrectly).

 

Load type

  • Decompress on Load: audio is stored on disk in the Compression Format specified, but is decompressed and loaded raw into RAM as PCM. This takes up a lot of RAM and increases load time a bit, but is very cheap in terms of CPU and very fast to access.

  • Compressed in Memory: audio is stored both on disk and in RAM in the Compression Format specified. This takes less RAM and less load time, but costs CPU when the sound is played, as it needs to be decompressed on the fly each time it is played.

  • Streaming: Audio is streamed directly off the disk without going into RAM at all. This uses some disk throughput and CPU, but on PC/console, it does not have a huge impact on performance so long as there are no more than two sounds streaming simultaneously. On mobile (particularly lower-end and old devices) streaming more than one stereo audio file simultaneously starts taking a heavy toll on CPU use (see Warnings below).

 

Compression Format

  • PCM: straight-up raw audio data, fully uncompressed and takes up loads of disk space and RAM, but playing it is basically free because it does not need to be decompressed.

  • ADPCM: very old compression format with a compression ratio of 3.5:1. Pretty cheap to compress/decompress compared to Vorbis or other compression formats, but introduces digital noise artifacts into the sound, so should only be used for noisy sounds in which this would not be noticed. If you are unsure whether ADPCM is right for a particular sound, preview the sound in both PCM and ADPCM formats – if you hear a difference, I recommend going for PCM instead.

  • Vorbis: compressed format which is compatible with most major platforms. Can handle quite high compression ratios whilst maintaining decent sound quality, but is somewhat expensive to compress and decompress on the fly.

I’ve only listed the editor default formats here, see Notes below for platform-specific types and more on compression formats.

Here is a quick comparison of the CPU use of different formats on my PC in the Unity editor:

Compression format

CPU usage with 1 voice

CPU usage with 6 voices

PCM

~0.05%

~0.3%

ADPCM (compressed in memory)

~0.2%

~1.0%

Vorbis (compressed in memory)

~0.5%

~3.2%

 

 

Quality (does not apply to PCM/ADPCM)

  • 70-100%: Practically indistinguishable from full quality PCM to all but audiophiles with expensive audio equipment

  • 1-69%: Varying degrees of quality, the lower values introducing a lot of gross noise artifacts, removing dynamic, and making the audio sound flat and lifeless. You can use the preview button in the inspector pane to see how noticeable the drop in quality is for that particular sound.

These quality settings are assuming that the sound will be played at 100% speed, so it cuts out some upper frequencies which are usually outside the range of hearing, but would be shifted down into hearing range of the sound is played back at a lower speed. If you plan on playing back a sound at a low pitch/slow speed, consider encoding as PCM instead .

Lower and lower quality levels give diminishing returns:

Vorbis quality

% of original size

Compression ratio

100

~20%

~5:1

75

~10%

~10:1

50

~7%

~14:1

25

~4%

~25:1

1

~2%

~50:1

 

 

Sample Rate Setting

  • Preserve: Just uses the sample rate at which the sound was made.

  • Optimise: Unity analyses the audio for the highest frequency that it contains, then uses the Nyquist theorem to determine the lowest sample rate that can be used without losing any of those frequencies. E.g. if the highest frequency that the sound contains is 10kHz, the sample rate can be lowered to 20kHz without any loss of sound content. This setting can only be used with PCM/ADPCM.

  • Override: If you want, you can manually apply a new sample rate to the AudioClip. I generally would not recommend this unless you know what you're doing.

 

3. Suggested settings for PC/console

Type of sound

Load in background

Load type

Preload audio data

Compression format

Quality

Sample rate setting

Dialogue

Y

Compressed in memory

Y

Vorbis

70

Preserve

Environmental long loops

n/a

Streaming

n/a

Vorbis

70

Preserve

Environmental one-shots

Y

Decompress on load

Y

Vorbis

70

Preserve

Foley

N

Compressed in memory

Y

PCM

n/a

Optimise

Footsteps

N

Compressed in memory

Y

PCM

n/a

Optimise

Music (long pieces)

n/a

Streaming

n/a

Vorbis

85

Preserve

Music (stingers)

Y

Compressed in memory

Y

Vorbis

85

Preserve

Non-dialogue vocalisations

Y

Decompress on load

Y

Vorbis

70

Preserve

Special FX (short)

N

Compressed in memory

Y

PCM

n/a

Optimise

Special FX (long)

N

Decompress on load

Y

Vorbis

70

Preserve

UI sounds (long)

Y

Decompress on load

Y

Vorbis

70

Preserve

UI sounds (short)

N

Compressed in memory

Y

PCM

n/a

Optimise

These suggestions should work well for games with up to about 10,000 AudioClips. I'm suggesting Decompress on Load for most sounds, which means they will be stored as raw audio data in RAM. If the total combined file size of your uncompressed audio files is greater than your RAM limitation, you may want to change your longer files to Compressed in Memory – but be aware that this will introduce a small CPU overhead each time one of these sounds are triggered. A PDF version of these tables can be downloaded here.

 

4. Suggested settings for mobile

Type of sound

Load in background

Load type

Preload audio data

Compression format

Quality

Sample rate setting

Dialogue

Y

Compressed in memory

Y

Vorbis/MP3

50

Preserve

Environmental long loops

Y

Compressed in memory

Y

Vorbis

35

Preserve

Environmental one-shots

Y

Decompress on load

Y

Vorbis/MP3

50

Preserve

Foley

N

Compressed in memory

Y

PCM/ADPCM*

n/a

Preserve

Footsteps

N

Compressed in memory

Y

PCM/ADPCM*

n/a

Optimise

Music (long pieces)

n/a

Streaming

n/a

Vorbis

70

Preserve

Music (stingers)

Y

Compressed in memory

Y

Vorbis/MP3

70

Preserve

Non-dialogue vocalisations

Y

Decompress on load

Y

Vorbis/MP3

50

Preserve

Special FX (short)

N

Compressed in memory

Y

PCM/ADPCM*

n/a

Optimise

Special FX (long)

N

Decompress on load

Y

Vorbis/MP3

50

Preserve

UI sounds (long)

Y

Decompress on load

Y

Vorbis/MP3

50

Preserve

UI sounds (short)

N

Compressed in memory

Y

PCM/ADPCM*

n/a

Optimise

 

*See the description of the ADPCM compression format above in the Understanding the settings section if you are unsure about whether to use PCM or ADPCM. If you are not desperate to save disk space, I would advise erring on the side of PCM.

These suggestions should work well for most mobile games, or at least serve as a jumping-off point. If you think the settings above might not be right for your game, have a look at the full descriptions of settings above. A PDF version of these tables can be downloaded here.

 

5. Warnings and Notes

Warnings

  • Streaming multiple audio files at once is relatively light on CPU use for PC and consoles, but can present a big problem on mobile - especially low-end or old devices. Below is a chart of measurements I took using Unity's profiler, comparing the effect of streaming multiple audio files on various Samsung Galaxy phones and my own PC. The first chart shows 1-12 simultaneous streaming audio sources, the second is a zoom-in on sources 1-3.

Showing the CPU usage from 1 to 12 streaming sources

Zoomed-in on 1 to 3 streaming sources.

  • If a sound is set to Decompress on Load, using Vorbis compression will only make it 1/10th of the size on the disk, not the RAM, which still stores the raw PCM data. If you set the sound to Compressed in Memory, you do get the RAM saving, but at the cost of CPU to decompress it on the fly.

  • Every load type except for Streaming will by default load audio data into RAM and leave it there until the scene is unloaded. If you are running your whole game in one big scene, this could fill up all of your available RAM. To make matters worse, manually removing AudioClips from RAM is hugely inefficient and can cause a frame hitch, as can leaving it up to garbage collection. If you have a lot of audio data, you may want to optimise for lower RAM use by sacrificing in other areas, utilising Unity asset bundles. The Preload Audio Data setting has no bearing on this either, as it only determines when the data is loaded into RAM, not what happens to it afterwards..

  • If your target platform supports the MP3 format, be aware that it does not automatically loop seamlessly, so I do not recommend using MP3 compression for atmospheric or music loops. Because of the way MP3’s are encoded, silent padding is often added to the end of the file to make the overall number of samples perfectly divisible by “frames” of 1,152 samples. There are ways to create seamless loops with MP3’s - but that’s for another guide.

  • Disabling Preload Audio Data and enabling Load in Background will make a large file play late, but will not cause a CPU spike. This is because it takes time to be loaded but is also not stalling the main thread.

  • Disabling Preload Audio Data and disabling Load in Background will cause a large file to stall the main thread when the sound is called for the first time. However, this is not an issue if you are using FMOD, which runs decoding on a separate thread.

 

Notes

Compression formats:

  • Always make sure that the files you are about to import into Unity are in an uncompressed format like WAVE (.wav) or AIFF (.aiff). Many compressed formats are lossy, which means that information is lost when they are encoded. If you import a compressed file such as an MP3 or Vorbis into Unity, Unity will first decode it into an uncompressed format, then re-encode it into whichever format you select – even if it's the same format that you started with. This can introduce more compression artifacts, which are generally undesirable.

  • There is a great write-up on various audio compression formats, their pros and cons, and which platforms support them in the documentation for AudioKinetic’s Wwise middleware.

  • If you're using FMOD, you have access to its FADPCM format, which is a vast quality improvement upon the old ADPCM format. However, this is not built into standalone Unity.

  • You may want to use MP3 instead of Vorbis on iPhone, for example, as it has a hardware MP3 decoder which saves your CPU from having to deal with decompressing MP3 files stored in RAM or on the disk. But beware that this may not be appropriate for looping sounds (see Warnings above), and that it can only decode one MP3 at a time. If multiple MP3's need decoding simultaneously, these will be done in software just like Vorbis. This shouldn't present any major issues, but it is worth noting that MP3 decoding is slightly more CPU-intensive than Vorbis decoding.

  • If your target platform is Playstation 4, the ATRAC9 format offers a fairly high compression ratio with less CPU overhead than Vorbis or MP3.

  • Likewise, for Xbox One, Microsoft’s XMA format will serve well instead of Vorbis or MP3. Microsoft recommend a compression ratio of between 8:1 and 15:1 for the best performance and quality.

Misc:

  • In the Suggested Settings tables, I've listed some sounds with PCM as their compression type (actually uncompressed) and Compressed in Memory as their load type. Using Decompress on Load would also be fine, I'm just pointing out not to set these sounds to Streaming.

  • Some middleware solutions such as FMOD and Wwise have their own way of handling audio importing, making these Unity settings redundant (unless you also want to run some sounds without using the middleware, for some reason).

  • The sound types in the Suggested Settings tables include the category “Foley” which may be cause for argument, since it’s a film term with a very specific meaning, and by rights probably shouldn’t be used in games at all. However, I think it’s the most fitting term for the various sounds associated with the physical interactions between characters and objects in a game.

  • Although various target platforms have native Sample Rates, Unity plays around with mixing and matching sample rates all the time. What this means is that whenever a sound is played at a speed other than 100%, or if a sound is imported using Optimise Sample Rate, then the console/device has to do its own sample rate interpolation on the fly, to make everything come out at its native sample rate. This is usually a negligibly cheap operation, even on mobile, but could potentially pose a problem for some platforms, depending on how they handle it.

 

This guide has been compiled using the best of my current knowledge, but if you feel like there is something missing, incorrect, or not as well explained as it could be, let me know in the comments. I’m very grateful to Aaron Brown, Anne-Marie Weber, Brett Paterson, Chris Webb, Dan Treble, Frederik Max, Jeff van Dyck, Josh Sanderson, Kirk Winner, Maize Wallin, and Maris Tammik for sharing their knowledge and giving honest feedback on this guide, and to Kieran Lord for seeing my optimisation notes and saying "you should publish that!"

Read more about:

Featured Blogs
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like