AI Can Replicate Any Human Voice: What Does That Mean for Podcasts?
Artificial intelligence can now replicate any human voice, and this technological advancement echoes across several industries. Let’s take a look at how.
Podcasting is moving towards a more informal genre of audio narrative. More emphasis is placed on the relationship between host and listener, fostered by a less crafted use of language.
That is to say, the host attempts to speak everybody’s language, making everything easier to understand and react to. For this reason, audio storytelling follows an ascending trend in terms of popularity. The numbers support this claim.
According to Statista, in 2018, there were already 75 million podcast listeners in the United States, and it is predicted that the number of monthly listeners will reach 164 million in the year 2024. The compound annual growth rate between 2019 and 2023 was estimated to be 17%.
In 2020, three quarters of the Americans declared to be aware of podcasting, and over half of them (55%) have already listened to a podcast. The highest earning podcast in the world, ‘The Joe Rogan Experience’ by Joe Rogan, made 50 million U.S. dollars in 2019 and reportedly had almost 200 million downloads per month.
The main current issue of the podcast industry is the maintenance of its editorial independence while being able to gather the resources necessary to support such extraordinary growth. As it is often the case nowadays, breakthrough technologies such as artificial intelligence with its voice cloning capacities may provide at least part of the solution.
A pertinent example is the Nixon project — which makes a compelling case that AI can replicate human voicein a way that’s indistinguishable from original voices. A group of researchers, journalists, and artists at MIT teamed up with voice cloning company Respeecher and VDR company Canny AI to create an alternate history of the first venture to the moon, where astronauts Neil Armstrong and Edwin “Buzz” Aldrin fail their mission and are stranded on the moon.
They created a posthumous deepfake by altering an actual video of president Nixon, and so made it possible to actually hear him inform the world that the journey to the moon had a tragic outcome.
How Podcasts Can Leverage AI
The main challenge for computerized voices is to replicate human voice with all its emotional nuances and avoid that robotic sound. The aim is to equip voices with the capacity to express subtle nuances. Speech-to-speech voice conversion technologies, underpinned by artificial intelligence techniques, provide the means for doing precisely this.
Virtual assistants like Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana, or Google Assistant still use text-to-speech. While TTS is still a very useful piece of technology, it makes it rather complicated to obtain different sounds for a voice, e.g., having Siri sound like an older man: it calls for a new audio file besides the pre-recorded files which contains all the words that might be needed in a conversation. So let’s look at some benefits that speech-to-speech promises to bring about.
How Can You Use AI Voice Cloning for Podcasts?
Artificial intelligence empowers voice cloning. It can help you perfectly replicate any voice for any podcast project. A “smart cocktail” usually made up of two ingredients — classical digital signal processing algorithms and proprietary deep generative modeling techniques — allows content producers to use the most appropriate voice, despite logistic issues (e.g., actors who can’t be in the studio when you need them, actors who have passed away).
Here is a list of how you can use voice cloning to improve podcast production and, at the same time, bring it closer to what your audience wants to listen to.
1. Co-opt celebrities into your project
The more famous the actor, author, athlete, etc., the more difficult it is to use their voice in a podcast. But if you leverage artificial intelligence to replicate human voice, you no longer have to wait until their extremely busy schedule allows them to come to the studio. You can thereby offer your audience recordings of their favorite voices without investing lots of time and money resources into bringing them to the studio.
2. Bring back voices from the past
Voice cloning can work its magic to help you finish your project according to your expectations even if, unfortunately, one of your actors has passed away. Are you making a historical podcast about the last days of president JF Kennedy? Voice conversion technology can help you use his exact voice and not just “close approximations”.
3. Use children’s voices without all the fuss that working with kids typically involves
Kids often say amazingly funny things not necessarily because of the content, but simply because of the way they say it (tone, intonation, accent, etc.). At the same time, they can be very challenging to work with. Voice synthesis eases the process by allowing you to have professional actors say the things that children say, precisely the way they do it.
4. Get going fast and keep it up until you finish your podcast project
AI can replicate the human voice in a flash. All you need to provide is a high-quality recording of the target voice, and in a short time, you’ll be good to go.
The advertising revenue from podcasts reached $220 million in 2017, and it is doubling each year. Users’ rising engagement is the main attraction for potential advertisers, together with the fact that podcast ads are actually heard, as evidenced by completion rates around 90 percent.
Moreover, advertisers are willing to pay for some of the podcast slots as much as $30 CPMs (cost-per-mille, or cost per thousand appearances of an ad). The number makes more sense if you consider that Facebook’s average CPMs is about $6.
There seems to be a close analogy between cable TV slowly but surely taking the place of network TV, and the relationship between podcasting and radio. Even if we restricted the analogy to radio’s ad budgets, it would mean a bonus of $20 billion to the current financial situation of the podcast industry. And, as we were saying at the beginning, the raising number of listeners justifies expectations of a systematic, continuous profit increase.
Numbers such as those make it clear that podcasts are here to stay, and that there is a strong need to empower audio content producers with more efficient methods.
The points in the list above result in enhanced productivity and profitability, so leveraging AI voice cloning for podcasts can facilitate the development of the sector and, hence, better dealing with the issue of editorial independence.
This article was initially published by Respeecher as a guest post on DZone.