Speech-to-Speech Synthesis: Legal Implications of Voice Cloning Technology

6 min readJun 21, 2021


Imagine that you can reproduce the voice of a deceased person or that you can nearly-perfect reproduce the voice of a celebrity, with all the inflections and emotions. The difference can’t be noticed and can only be detected with special help. Voice cloning technology makes this possible.

Speech-to-speech synthesis comes with applications in areas such as cinematography, healthcare, gaming and more. Respeecher has the technology needed to replicate voices for media projects, filmmakers, game developers, advertisers, podcasters and content creators of all types. The novelty is represented by the quality of the reproduction due to the fact that it contains the nuances and the emotions of the human voice.

Speech-to-speech voice synthesis is the way we can now reproduce even the emotions transmitted by a human being, not just the inhuman sound, robotic and impersonal. Explained simply, speech-to-speech synthesis is a technology which produces artificial human speech using recorded audio stored in a database.

We do this in compliance with the law and ethics, without prejudice to any party. It is safe and secure because while using this technology, ethics is a priority: the first step in the process is making sure that the client has permission from the target voice. Moreover, the policy includes keeping sensitive data secure.

In addition, this type of voice cloning is a revolutionary one, which changes the film industry, media, gaming, the health system and more. With the help of AI we manage to move from the level where a machine simply reads a text dry. Through this voice cloning system we are able to replicate the voices of deceased actors, personalities such as Barack Obama, children’s voices and more. This technology actually refers to any target voice one chooses to mimic.

It all started with the first full text-to-speech for English system which was built in 1968 in Japan. From that moment, devices were able to produce intelligible speech. With the development of technology, we got to this point where we have the ability to replicate voices — including the emotions and the inflections of the human voice. Speech-to-speech system is actually a milestone that follows text-to-speech, adding more humanity and creating a dynamic environment, a technology that comes in handy for the film industry, gaming and most types of content creation industries.

Speech-to-speech voice synthesis gives people back their ability to speak naturally in case of an accident Also, it offers automatic translation into any spoken language in the recorded human culture. The ideal tone and the emotions imitate reality. Let’s suppose you need a voice for an ad, but the person you need is no longer available.

With the help of speech-to-speech voice conversion technology you can obtain exactly the voice you need, with all the inflections you are seeking for your ad. Of course, ethical voice cloning is the right way to use this technology and that means being morally and legally committed.


The development of voice cloning technology is of great importance because it uses artificial intelligence in order to help companies to lower costs and to have access to voices they need. On the other hand, the voice is a personal right and comes with legal implications. Privacy and personal security are very important, this is why people need to be aware of risks. Researchers can create a near-perfect voice clone now, but this technology is available for people who don’t have the intention to respect law and ethics too.

You must have heard about grandparents scams and phishing scams — violations of the law which can harm people who do not have the information needed to avoid them. Global giants like Facebook, Microsoft or Twitter are already taking steps in order to prevent the spread of deep fakes.

Respeecher uses their voice cloning technology to revolutionize the film industry, media, gaming and creative industries while respecting a set of principles by working only with trustworthy clients, obtaining written consents from voice owners or developing a watermarking technology for even more security.


Voice cloning technology allows people to regain their voice in case they lost it after an accident or even if they were born without the ability to speak. Speech-to-speech system can replicate a voice of a deceased person and that can be a real advantage in the creative industries.

In spite of this great evolution, there are still cases of misusing the voice cloning technology which can affect people and here we mention different types of scams or malicious uses. Misusing AI can lead to deep fakes — which represent a threat to privacy, civility and democratic process by, for instance, replicating voices or images in order to present a false fact to the audience with the aim of disinformation.

The best safeguard so far in order to avoid these negative effects of misusing AI is to be aware that this technology exists and how it can be used. Working on detecting manipulated media is a priority and solutions are implemented — like Facebook, which announced measures in order to avoid deeply modified content in matters of clarity or quality. An effort of the community is required in order to avoid deep fakes, but it is also an educational move. Having this technology under control is easier if people start to understand it better and realize its positive impact for the creative industries and for the health system.

Regulation and legislation are based also on trust when it comes to AI. Democracy and national security represent high values and states are making big steps in order to protect them. For example, California became the first U.S. state to incriminate the use of deep fakes in political campaign promotion and advertising. China banned the distribution of fake news created with AI and virtual reality. The potential harms are starting to lose ground especially when people are becoming more and more educated in matters of AI and voice cloning technology.


Voice cloning technology is based on collecting data, in this case voice recordings — this is why we need to discuss data protection. The Polish Law states that the voice is a personal right, so violating it attracts penal liability, both civil and penal: «anyone who produces devices or components of devices for the purpose of unauthorised removal or circumvention of effective technical devices applied to protect a work or the subject matter of related rights against replaying, copying or reproduction or trades in such devices or components of such devices, or advertises their sale or rental, is liable to a fine, restriction of personal liberty or imprisonment for up to 3 years». Also, «anyone who owns, stores or uses devices or components of devices as referred to in paragraph 1, is liable to a fine, restriction of personal liberty or imprisonment for up to a year»


Voice cloning technology and its fast evolution requires the need to amend and adjust the existing laws. The best protection against fakes is training our brains but, most importantly, to be aware that this type of AI exists and that any voice on this planet can be reproduced. The precision of speech-to-speech technology is a real help but it can also represent a danger if it is misused. It is very probable that AI is going to be the solution to build the tools to fight malicious usage of such technology.

Respeecher is committed to avoid:

  • Blackmail
  • Spam
  • Serving unlawful competition

Respeecher applications have one common mission: doing only ethical voice cloning and being a company which educates the clients and works only for those which share the same ethic principles and values. We want to let people in the film, gaming, audiobooks, health and podcast industries benefit from this great technology without doing any harm.

We are aware of the dangers and this is the first step in preventing them. Respeecher does not allow any deceptive uses of the technology and we are constantly seeking ways of improving the protection.

This article has initially been published by Respeecher as a guest post in Trotons Tech Magazine.




AI Speech-to-Speech Voice Synthesis for Next Generation Content Creators