How Voice Cloning Allows for Multiple Language Conversion using AI

Respeecher
4 min readOct 22, 2021

Voice cloning is a relatively young technology. Many companies that could save significant production costs by applying it to their projects are unaware of the technology’s existence. This article will look at the key business benefits of using voice conversion in multiple languages.

Entertainment and advertising industries: the most obvious beneficiaries

In a past series of blog posts, we looked at how voice cloning helps marketing and film dubbing in detail. In short, the entertainment and advertising industries utilize voice cloning for a couple important reasons.

The earnings of companies directly depend on the foreign markets in which their product is available. It doesn’t matter what the product is — the content itself (as in movies or video games) or particular goods — because access to foreign markets is only possible with localization.

The situation gets even more complicated if your content features a famous person. Re-dubbing an English-speaking star into Japanese or Russian will never look as authentic on-screen as the original.

AI voice generation can solve this problem. Imagine Beyoncé suddenly speaking Mandarin — this is the original voice of the singer, only now she speaks fluent Mandarin. That is how Respeecher’s technology operates. This iconic star will seem to have the ability to speak in any language in the world.

But what about lip-syncing, you ask. Even if we assume that a voice is identical to the original, the video does not match the lip movement with the words that are spoken on the screen.

For the most demanding projects, this problem has been solved by deepfake video technologies. Actors can not only speak an unfamiliar language but they can also appear as if they really know it.

By combining speech-to-speech voice conversion and deepfake video adjustment, businesses can achieve awe-inspiring results. Here are the most common use cases of voice cloning.

1. Perform ADR without dubbing actors

AI voice cloning completely disrupts not only the initial process of dubbing but ADR as well. ADR is essential when dubbing in foreign languages because not all dubbed speech fits the original scenes perfectly.

Editing original scenes, adjusting emotions, and maintaining meaning becomes easier when you don’t have to record actors in a studio.

2. Create branded voices for AI-powered bots and an automated customer experience

If you’re running a service enterprise, chances are you have already tried implementing chatbots and AI-powered customer assistants.

Businesses use voice cloning software to create the same user experience for a variety of clients worldwide. You can create a virtual identity for your digital client assistant and be sure that it is recognizable, no matter the country you are servicing.

The same technology is capable of creating a consistent audio experience for AI voice assistants like Alexa or Google Home.

3. Scaling production for dubbing agencies

Localization and dubbing agencies depend on the workload of their dubbing actors. A typical practice in many countries involves the voices of ten to twenty actors for use in dozens of films, video games, and advertisements every year.

All this is incredibly demanding and can often lead to overload and higher rates of turnover.

Speech synthesis frees agencies from the bonds of working with the same overloaded actors from project to project. Because dubbed content can be captured in the original actor’s voice, anyone can now be the source voice for dubbing.

This makes it possible to produce almost unlimited content in any language.

What’s the technology behind voice conversion in multiple languages?

Voice conversion is pretty simple to explain. Imagine you want someone to speak in your voice. Thus, your voice is the ‘target’ voice — the one used as a reference for cloning. The voice of the other person is the ‘source’ voice.

To create a convincing voice clone, Respeecher needs around an hour of voice-recorded content for the target voice. Then, we feed this content into our machine-learning system. It analyzes the voice and produces clones that are then instantly comparable to the original.

When the ML algorithm cannot distinguish a clone from the original, voice cloning is complete. Now there are no limits to how much vocally cloned content the system can generate from any given source voice.

There are plenty of use cases aside from localization where voice cloning is beneficial. These include resurrection projects where iconic voices of the past are brought to life.

These include the unexpected appearance of Vince Lombardi’s at the Super Bowl and the recent Manuel Morales for a Puerto Rican basketball match broadcast.

There are many stories of Hollywood projects that use voice cloning for various reasons, including actors ADR and voice de-aging. One of the most famous examples is synthesizing the young Luke Skywalker’s voice for the recent Dysney+’s Mandalorian series.

Conclusion

If you’re working with localization on either the producer or agency side, there’s no doubt you can benefit from voice cloning.

We encourage you to get in touch with us for a brief consultation regarding the use of Respeecher to scale multiple language dubbing or any other related content.

We are always enthusiastic to hear from businesses and content producers to see how we can help them better navigate emerging technologies and the market.

This article was initially published on the Respeecher blog.

--

--

Respeecher

AI Speech-to-Speech and Text-to-Speech Voice Synthesis for Next Generation Content Creators