Respeecher Explained: The Speech Synthesis Software for Film & TV Creators

5 min readJan 22, 2021


Speech synthesis is the artificial simulation of human speech by a computer, called speech synthesizer, and implemented in a speech synthesis software or hardware. Synthesized speech is generated by integrating pieces of recorded speech that reside in a database. It is based on two kinds of technologies, text-to-speech and speech-to-speech.

Text-to-speech (TTS) technology is used for converting text information into audio format for voice-enabled services. First it transcribes text input (e.g., data from word processors, standard ASCII from emails, mobile text messages, scanned texts from newspapers, etc.) into phonetic representation (high-level synthesis), and then it generates speech waveforms from the phonetic and prosodic information (low-level synthesis).

Speech-to-speech voice conversion is a newer generation technology, related to those used for the latest text-to-speech systems. It can generate more dynamic and emotion-laden output — and so it provides a better approximation of the complexities inherent in human speech.

In fact, this is, in a nutshell, Respeecher explained. We use proprietary deep learning (artificial intelligence) techniques together with classical digital signal processing algorithms for the production of top quality synthetic voices. By doing this, we allow the creation and scaling of content for film and TV, video games, podcast and audiobook production, and more.

Gain more control over your projects

Our mission at Respeecher is to give film & TV creators the opportunity to innovate when it comes to using voices in creative ways. We aim to help filmmakers and TV producers gain more control over the audio part of their projects.

How, you ask? By resurrecting voices from the past, using voices of famous actors who can’t be present at the recording site, recording kids’ voices more easily even after they’ve grown up.

We do this by cloning human speech, which gives us the opportunity to swap voices without the robotic touch currently shared by many video games or auto navigation systems underpinned by text-to-speech technologies.

We are fully aware that voice cloning can be dangerous if used deceptively, i.e., fooling people into believing someone said something they didn’t. Indeed, there are strong ethical concerns regarding the use of speech synthesis software for content creators. We are committed to a set of ethical principles, such as not using voices of private persons without permission, in order to protect subjects’ privacy.

This principle is substantiated by our requirement of written consent from voice owners. What we do allow are non-deceptive uses of the voices of historical figures — for example, the use of president Nixon’s voice for the ‘In Event of Moon Disaster’ project we created in collaboration with MIT. Additionally, in order to easily distinguish Respeecher-generated content from other content, we are developing a unique audio watermark.

The benefits of leveraging revolutionary advances in AI for film & TV creators

Respeecher technology transcends the limits of text-to-speech solutions, and is thus better suited for more complex projects. It not only minimises the lack of emotions in voices, but it also creates voices that capture subtle nuances such as humming and giggling, and better deals with the pronunciation of unusual or foreign words.

  • Perfect mimetic quality. The synthesized voices we produce are basically indistinguishable from the original. They are way beyond the flat, robotic sound of text-to-speech output. Emotional modulations make them sound human. Respeecher speech synthesis software allows content creators to use voices with highest production value.
  • More creative control. Our speech-to-speech technology empowers content creators with more degrees of freedom for the creative process. Consider, for instance, the possibility to edit the script during processing without the concurrent need to re-record the so-called ‘target voice’, or to resurrect the beloved voice of actors who have passed away.
  • Enhanced security. Because of our experience with leading Hollywood movie studios, game developers, and major multinational corporations, we are fully aware of the necessity to protect sensitive data. Therefore we act upon this normative stance.

5 steps towards using the Respeecher speech synthesis software for film & TV creators

  • Ask for permission to clone the voice. Before embarking on any project, we require that our clients obtain written consent from the owner of the target voice.
  • Collect target voice data. We ask that clients provide us with high-quality recordings of the target voice, either pre-recorded or a new recording.
  • Collect source voice data. We also need high-quality recordings of the source voice actor performing the lines from the target voice recording.
  • Respeecher works its magic. We train our advanced AI to come up with pitch-perfect voice-to-voice swapping models.
  • Let us do the voice cloning. Our clients are free to decide: either send us the audio files to convert, or use our practical web app. This is all you need to do in order to get perfectly cloned voices.

How can film & TV creators leverage speech synthesis software?

  • Film and TV. Cloned voices can be used for dubbing an actor’s voice in post production, allowing your spectators to keep enjoying the voice of actors who are no longer among us.
  • Animation. You can focus on the dynamics of drawings, and allow our speech-to-speech technology to handle voice logistics.
  • Game developers. Speech synthesis software allows you to have your characters sound exactly like (i.e. indistinguishable from) particular figures that they are based on. Games can become more real and thereby ignite players’ passion and commitment.
  • Podcasts and audiobooks. Automating audiobook narration, or adjusting features of the narrator’s voice so that people can actually listen to the author him-/herself reading the book — these are but a few examples of how voice cloning innovates audiobook production.
  • Advertising. Voice cloning lets you tailor your ads to particular audiences by using region-specific pronunciation.
  • Dubbing and localization. Respeecher technology allows you to speed up and ease the dubbing process, and perhaps even make it more entertaining.
  • Future applications. Respeecher’s voice cloning technology welcomes the future of call centers, healthcare providers, and many more to come.


We use state of the art technology to revolutionize movies and other creative projects, that is, we develop speech synthesis software for film & TV creators. However, in the wrong hands, voice conversion technology can be used maliciously. We do not allow any deceptive uses of our technology, nor do we use voices without permission when this could impact the privacy of the subject or their ability to make a living. This means we will never use the voice of a private person or an actor without permission.

If you’d like to find out more about our technology or if you have any questions, please reach out through our website. We’d love to hear from you. (Let me tell you a little secret, too. We’re going to launch the first ever voice marketplace soon. Subscribe to our newsletter to be among the first to test it.)

This article was initially published by Respeecher as a guest post on FilmFestivals.




AI Speech-to-Speech Voice Synthesis for Next Generation Content Creators