Voice Cloning and Video Games: How Game Developers Can Create Synthetic Voices

Respeecher
6 min readMar 25, 2021

--

While video game developers create stories and plots for virtual characters, until recently, the dubbing of game character voices wasn’t so different from that of characters in movies.

The process for a video game involves dubbing the voices of actors and actresses along with a substantial time and money investment in the production. However, with voice synthesizing technology, everything is changing.

What are synthetic voices?

A synthetic voice is a human voice that has been produced by a computer. The synthesized voice is indistinguishable from the real one. This means that deciphering between the actual person’s voice and the machine’s is impossible for an outsider listener.

How does voice cloning work?

The most famous among the lay audience is the so-called text-to-speech (TTS) synthesis. For this process, the computer reads the text, and this speech is recorded.

The most common example is when you ask Google to read the text you type in services like Google Translate.

This type of speech is easily distinguishable from a natural human’s voice. In a recent post, we examined the technical aspects of a characteristic robotic voice.

Things get more interesting in speech-to-speech (STS) voice synthesis. Imagine you need to dub a video game character, only the original voice actor is no longer available.

It’s easy to imagine with gaming franchises that have been around for decades. Unfortunately, video game or cartoon characters can easily outlive their human counterparts.

So how do you capture a character’s original voice when you don’t have access to the original actor?

Artificial intelligence and machine learning technologies present a solution to these problems. Let’s briefly describe the STS generation process, as developed by Respeecher:

  1. The first requirement is good quality audio recordings of the voice being cloned. The recordings should be at least one hour in length. Artificial intelligence cannot create a perfect voice model with anything less.
  2. We then feed this data into the machine learning algorithms. The system performs complex calculations to form a model of the original voice. Here it is crucial that the original voice recording includes as many different emotions, elocutions, tones, cadences, vocal timbre, etc., as possible. The more emotionally-loaded the original speech is, the more accurate the voice model will be.
  3. When the model is formed, the ability to generate an unlimited amount of audio content that is indistinguishable from the original speaker is now possible. All that remains is for someone to record the speech that is needed.
  4. The final stage is the process of transforming that person’s recorded voice into the voice of the original actor. This process of conversion involves completely morphing every characteristic of speech into the authentic actor’s voice.

With this type of technology now available, video game producers are no longer restricted to dubbing actors. They can even generate unique voices that did not exist before.

Here’s how synthetic voice generation changes game production for the better

Some of Respeecher’s most innovative benefits that allow game developers to save time and money on dubbing and additional dialogue replacement (ADR) include:

  1. Allowing for some of the most famous stars to participate in the project. It’s primarily about the money and time saved on bringing in a celebrity. Suppose you want to incorporate an a-list star in your project. In that case, they will probably appreciate that they only need to provide a single, hour-long voice recording. Based on this, Respeecher can generate an unlimited amount of original speech content.
  2. Resurrecting voices from the past. Imagine that in your WWII strategy game, the dialog of the main characters and villains is in their original voices. You can also easily replace a voice actor that left your project somewhere between the first and second games.
  3. Making it easy to solve the problem of dubbing child actors. When children grow up, their voices change. This can become a problem if your project evolves over time with the same child as the hero. Luckily, you can keep the original character’s voice without depending on the original actor that dubbed them. Learn more about all the benefits of cloning a child’s voice here.
  4. AI voice synthesis makes adding adjustments to game content easier. If the writer or director makes edits to a scene, you no longer need to work with the voice actor to add their changes. Instead, any modifications to the voice content are easily implemented by the sound engineer.

Respeecher’s tech is able to provide GameDev audio teams with creative and production flexibility by taking away dependence on specific actors/studios/production teams and making it a non-factor. It also allows interchangeability among VO actors, moreover any audio production team member now becomes able to voice over any character’s voice that’s in the system. — Alex Demydenko, Head of Business Development, Respeecher

5. Let your best available VO actors play even more characters by giving them access to a variety of male or female voices. It’s achieved by the gender-agnostic nature of the technology allowing female-to-male voice conversion and vice-versa.

6. Receiving custom voice cloning environments containing on-demand voices that allow rapid iterations on VO during all production phases, starting from the “storyboard”. Saving time and money due to production cycles requiring less people and resources. Ensuring management receives increased planning capabilities and control over the entire process.

While all of these benefits apply to custom projects, there is something you can use right out of the box.

Synthetic voice marketplace at your service

For several years, Respeecher’s clients have been asking the same question: “how can we access your library of synthesized voices?” And while every game project doesn’t have to reproduce the voice of a particular actor, gaming studios can significantly benefit from using one or two voice actors by transforming their speech into unique voices for dozens of game characters.

This is now possible thanks to Respeecher’s Voice Marketplace, a library of pre-existing target voices produced by combining characteristics from several voice actors.

The library is filled with character voices that do not belong to any particular person. This means that you do not have to worry about copyright. Any licensing arrangement is resolved between you and Respeecher. Currently, the marketplace is the most accessible source of synthetic voices in the world and allows you to minimize the price of sound production for your game by tenfold.

In addition to Respeecher’s synthesized voices, we are adding professional licensed voices to the library. Respeecher plans to connect world-famous actors with the most ambitious gaming projects, making cooperation pleasant and mutually beneficial.

In the meantime, you can find more information here. The marketplace is available for free for the first seven days, so you can see for yourself if it’s a good fit for your project.

What voice quality and authenticity can you expect with synthesized speech?

Although most of the projects Respeecher is working on are protected under an NDA, you can get a sense of our work’s quality with a couple of examples.

One of our recent projects was recreating the voice of the famous NFL coach Vincent Lombardi for the 2020 Super Bowl. Check out the original commercial.

Even sound engineering professionals find it difficult to distinguish between the legendary coach’s original voice and Respeecher’s synthesized version

Projects for the gaming industry are often much easier to produce. Most video game characters are created from scratch. But even in projects like Cyberpunk 2077 or Beyond: Two Souls, where famous actors participate, Respeecher can dramatically reduce costs in order to accurately clone the speech of a character.

Feel free to drop us a line if you have any questions or are interested in a customized demo.

This article was initially published on the Respeecher blog.

--

--

Respeecher
Respeecher

Written by Respeecher

Respeecher is your reliable AI voice partner that delivers ethical & authentic voices across creative, education, healthcare, tech & cybersecurity industries

No responses yet