Code[ish] Podcast: The Ethical and Technical Side of Deep Fakes featuring Respeecher
We’ve recently been invited to talk about deep fakes at Code[ish], a podcast created by Salesforce’s developer advocate team Heroku, exploring subjects like code, technology, tools, tips, and the life of the developer.
During two episodes hosted by Julián Duque, our CEO Alex Serdiuk and CTO Dmytro Bielievtsov talk about ethical deep fakes and the technical aspects of creating them.
Listen to The Ethical Side of Deep Fakes and The Technical Side of Deep Fakes to broaden your knowledge about synthetic media, specifically voice synthesis.
The synthetic media industry is based on AI-generated media including technologies such as text, music, video, image and voice generation. For example, CGI and Photoshop generate synthetic media, because they help others create modified content. At this point, synthesized video is much more advanced than synthesized audio.
The ethical side of deep fakes
Alex explains where Respeecher fits into all of this. We aim to revolutionize the way content is produced, by bringing more flexibility in industries like entertainment, video games, advertising, and more through the use of our speech-to-speech voice conversion technology.
Voice conversion use cases
- It’s hard to schedule top actors for voiceover or dubbing work. Voice conversion allows you to scale any voice and gives you the flexibility to record new lines anytime.
- Resurrect voices from the past: Bring back the voice of an actor who has passed away. Maybe you want to add a historical voice to a project.
- Record any voice in any language: Ready to capture an overseas audience? Speech-to-speech language agnostic technology empowers you to record in any language.
- Add dialogue anytime: Decided to add a few lines after filming? Just turn on your microphone and start speaking — without calling an actor back into the studio.
- Replicate children’s voices: Kids say the darndest things — but they’re challenging to work with. Read more on this topic in our article about voice conversion for children’s voices.
The technical side of deep fakes
Dmytro explains how synthetic audio can be produced and why it’s hard to fake. In general terms, there are already a few speech Machine Learning (ML) models already available on the internet, but the best way to clone a voice in a quality manner is to use an audiobook as a sample of the original speaker and to combine it with these pre-existing models.
The problem here is that the outputs produced by these models are poor in quality, so that’s one reason why speech-to-speech technology is hard to fake. Human linguistic variations and patterns and the emotional compound of the speech make the process of voice cloning difficult.
In fact, unlike text-to-speech technologies (TTS) that may produce dull content, speech-to-speech software (STS) generates more natural content, by preserving the voice intonations and the emotion of the original speaker.
Main concerns about the usage of unethical deep fakes
The main concern about deep fakes is the risk for them to be used in an unethical way, to pretend that someone said or did something that never happened. But it’s also true that all technologies have potentially malicious uses in the hands of the wrong people.
Dmytro offers more details about what a deep fake is from a technical point of view: a technology that uses deep learning and deep neural networks to fake a video/audio and replace it with another piece of video or audio content.
The term “deep fake” is used now for everything that involves synthesizing human appearance using a neural network.
The biggest potential danger of deep fakes is not their existence, but people’s inability to detect them. -Alex Serdiuk, Respeecher CEO
Respeecher works with leading Hollywood movie studios, game developers, and major multinational corporations and has strict ethical principles.
We do not use voices without permission when this could impact the privacy of the subject or their ability to make a living. In practice, this means we will never use the voice of a private person or an actor without permission.
This aspect ensures that the content produced by Respeecher can’t be faked or used in abusive ways. Also, another feature implemented for this goal is watermarks applied to each audio piece. Certain “artifacts” are embedded into the audio, which are imperceptible to humans, but easily identifiable by a computer program.
Our mission is to make sure that synthetic speech technology is used in beneficial ways, according to ethical principles. Our goals are very clear and we intend to:
- Educate the public about the capabilities of synthetic speech technology;
- Develop automatic detection algorithms that can detect synthetic speech even if it has not been watermarked by us;
- Work with gatekeepers of content such as Facebook and YouTube to limit the harm of voice cloning.
Follow our journey on social media on Facebook, Twitter, LinkedIn, and YouTube, and reach out if you’d like to find out more about our software and how you can use it for your content creation project.
This article was initially published on the Respeecher blog.