The AI arms race continues to accelerate, with new frontiers in voice cloning emerging almost daily. The latest development comes from San Francisco-based startup ElevenLabs, which just announced that their new AI model can now mimic voices speaking fluently in 30 different languages—a dramatic expansion from the original eight that were previously supported.
The company used Lukeman Literary, a literary agency and independent publisher, as an example, explaining that the company produces many audiobooks each year in multiple languages.
“It used to take Lukeman’s team weeks to produce a single audiobook because it required them to find the right voiceover artist, book a recording studio, and record and manage the post-production,” ElevenLabs said in an official blog post. “ Now the entire process takes a few hours,”
According to ElevenLabs, the new Multilingual v2 model delivers “emotionally rich” audio that captures the nuanced inflections of natural speech. Users type the text they want spoken in the target language, and the AI generates a seamless voiceover.
The company provides two main voice cloning options: a text-to-speech tool and a “VoiceLab” for cloning specific voices.
Users upload speech samples to create a custom voice clone, which the AI analyzes to build a synthetic version. This cloned voice can then be manipulated to say anything imaginable. ElevenLabs claims the latest update means these AI doppelgangers can now speak fluently in tongues like Swedish, Arabic, and Malay.
The expanded linguistic capabilities also coincide with ElevenLabs moving its voice cloning tech out of beta testing. The company aims to market the tool for practical applications like narrating audiobooks, as in the case of Lukeman Literary.
Addressing concerns
The technology’s potential for misuse clouds these business ambitions. Deepfake audio leaves users vulnerable to fraud and misinformation campaigns. ElevenLabs itself endured backlash last year when its platform was exploited to impersonate and harass public figures.
The company says more stringent safeguards have since been implemented, but ethical concerns persist. As Decrypt recently reported, a “scammer could use AI to clone the voice of your loved one,” and all it would require to achieve believable results are a couple of minutes of audio.
Major tech firms like Meta face similar criticism for developing powerful generative AI without full transparency. Meta recently unveiled an AI speech synthesis tool called Voicebox, which it acknowledged could easily facilitate deepfakes. Unlike ElevenLabs, Meta refrained from any public release given the “risks of misuse.”
However, despite the fears, rapid progress in AI voice cloning seems unstoppable. As linguist Mati Staniszewski of ElevenLabs stated, “Eventually we hope to cover even more languages and voices with help of AI and eliminate the linguistic barriers to content.”
Ensuring ethical implementation remains a steep challenge, as the line between global misinformation and innovative ways to communicate is very thin. Treading carefully is key—lest our global village of voices becomes a cacophonous Tower of Babel.