Sponsored By

Prototyping Dialogue with Google Text-to-Speech

Prototyping scenes that rely on recorded voices is a challenging and time-consuming task. That's why we've automated the process using Google Text-to-Speech.

James Simpson, Blogger

February 17, 2022

3 Min Read
Game Developer logo in a gray background | Game Developer

For a narrative-driven game like Arctic Awakening, it's really important for our team at GoldFire Studios to quickly get a feel for the flow and pacing of a scene. We do this some time before we actually head into the recording studio with our voice actors, so we need some kind of placeholder to fill in for the real recorded dialogue.

Developers have a few options for placeholder dialogue, including timed subtitles and scratch audio recorded by programmers (the aural equivalent of "programmer art"). We tried both of these early on in development before settling on a workflow using Google Cloud Speech, which has proved a significant time saver and given great results. Better still, our use case has fit within the product's free tier.

The Text-to-Speech product is a Cloud API which delivers passable-sounding voice clips from the text strings you provide. You can pass other options besides the content itself, specifying a language, one of several presets for the character of the voice, and a gender. The language code allows for different accents as well, for instance American, British, Indian or Australian English.

We already had a database with the data we needed to get started (the line itself and the character who said it), so plugging that into Google's API was relatively quick and painless. Here's a snippet of code from our dialogue management platform, StoryDB (which I'll talk more about in a later post), which is just a simple Node.js web server:

// Configuration for which voice goes with which character.
// List of voices available here: https://cloud.google.com/text-to-speech/docs/voices
const voices = {
  Alfie: {languageCode: 'en-US', name: 'en-US-Standard-I', gender: 'MALE'},
  Kai: {languageCode: 'en-US', name: 'en-US-Wavenet-B', gender: 'MALE'},
  Donovan: {languageCode: 'en-US', name: 'en-US-Wavenet-J', gender: 'MALE'},
  ATC: {languageCode: 'en-US', name: 'en-US-Standard-G', gender: 'FEMALE'},
  default: {languageCode: 'en-US', name: 'en-US-Wavenet-F', gender: 'FEMALE'},
};

checkProjectAccess(req.session.uid)
  .then(() => fs.promises.mkdir(`static/clips/${projectId}`, {recursive: true}))
  .then(() => getLines(ids))
  .then(async(ls) => {
    const generateLine = async(l) => {
      const input = {text: l.caption};
      const voice = voices[l.character] || voices.default;
      const audioConfig = {audioEncoding: 'LINEAR16', speakingRate: 1.25};

      // Perform the text-to-speech request and write the audio content to file.
      const [response] = await textToSpeechClient.synthesizeSpeech({input, voice, audioConfig});
      const writeFile = util.promisify(fs.writeFile);
      await writeFile(`static/clips/${projectId}/${l.id}.wav`, response.audioContent, 'binary');
    };

    await Promise.allSettled(ls.map(generateLine));

    res.end();
  });

And here's a sample of what gets generated:

 

From there, we jump into our game engine (Unity in this case, but this can work with any engine) and run a script which automates importing the line metadata and the audio clips from an API endpoint we set up. At that point, the clips and subtitles are ready to be used in a scene! Once the actual lines are recorded by voice actors, we just swap out the files and re-import, with the lines already implemented in-game.

We're really happy with the results, and we'd definitely encourage other developers to give it a try if using voiced dialogue. These recordings won't be suitable for release in most cases, but when operating on an indie budget, being able to quickly and easily prototype your dialogue systems can be a big win with no up-front cost involved.

Read more about:

Featured Blogs
Daily news, dev blogs, and stories from Game Developer straight to your inbox

You May Also Like