ChatGPT maker's new AI is so good that you can't be trusted with it (yet)
Text-to-speech 2.0
ChatGPT makers OpenAI may have wowed the world with its text-to-video model, Sora, last month. But it wasn't the only tool announced by the Sam Altman-led company, with a new text-to-speech model also revealed at the trail end of March.
The new model, called Voice Engine, was recently shared in a blog post and is capable of generate natural sounding speech that closely clones the voice of anybody from nothing more than a 15-second audio sample.
Far from the janky and distorted results most text-to-speech tools offer, Voice Engine's results are mind bogglingly impressive, with several examples showcase within the blog post that have to be heard to be believed.
Voice Engine: What can it do?
OpenAI has been testing Voice Engine since late last year, with several potential use cases already having been found for its text-to-speech model by a small sample of trusted partners.
The company was able to share a number of these early use examples, including:
- Reading assistance: Voice Engine can take a short 15-second clip of an enthusiastic and energized reader and apply it to practically any batch of text, with textbooks and education materials in particular being of use for those who struggle with reading or to rapidly generate voice-over content for learning assets.
- Translation: The Voice Engine model can also provide impressively accurate mimicry of voices, even when speaking in foreign languages. This is something that could have a massive impact on media, with dubbed or translated content no longer requiring a second track or voice-over. Using Voice Engine the original speaker's voice (along with their natural accent) can fluently translate into any language of choice.
- Support for non-verbal people: With its powerful, natural-sounding text-to-speech capabilities, Voice Engine is able to give a voice to those who may be non-verbal in a less robotic and othering way than synthetic voices of the past. It opens up a fantastic channel for those impacted to interact with others in a manner that makes them feel more comfortable and with a unique identity.
- Voice restoration: People who suffer from degenerative speech conditions can often feel like they've had their voice stolen from them. However, using the power of Voice Engine (and as little as a 15-second audio sample of their voice previously) those affected can restore their voices in recordings to one more familiar to others and themselves — allowing them the chance to reclaim a part of their identity they may have felt they'd suddenly lost.
That's great, but you can't have it (and you know why)
Sadly, while the tech on show is impressive, and could have many positive applications, we're all too well aware of how a tool like this could be misappropriated and abused if released to the wider public.
Meta ran into a similar issue last year when it announced its own AI text-to-speech model Voicebox — noting that the potential for misuse and unintended harm was so high that they wouldn't be publicly sharing the final model for use.
Not every deal is worth a squeal. Get only the good stuff from us.
The deal scientists at Laptop Mag won't direct you to measly discounts. We ensure you'll only get the laptop and tech sales that are worth shouting about -- delivered directly to your inbox this holiday season.
In an age of AI fakery, being able to make an exact audio clone of anyone from a 15-second sample could have catastrophic consequences for the person in question if used with nefarious intentions. And the potential for it to be used as a political weapon against figureheads and politicians could cause major disruptions if the audio is perceived to be true.
On the topic, OpenAI stated that it "hope[s] to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities," and that it has "implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it's being used."
However, that still may not be enough. Meta's Voicebox also featured what they called a "highly effective classified" that was able to distinguish between authentic and synthetic speech, but still deemed the software too volatile for wider release.
The same may be said of OpenAI's Voice Engine. As, no matter the tools you provide to authenticate a voice sample, the fact it exists in the first place could be enough to cause people to believe it and react without further investigation. While there is incredible potential for Voicebox and Voice Engine to do considerable good, these kinds of tools may simply be too much for many to handle. At least for now.
More from Laptop Mag
- What is AI? LLMs, GPTs, and chatbots explained
- The 5 most disturbing ways AI is currently being used
- AI is boring — How to jailbreak ChatGPT
Rael Hornby, potentially influenced by far too many LucasArts titles at an early age, once thought he’d grow up to be a mighty pirate. However, after several interventions with close friends and family members, you’re now much more likely to see his name attached to the bylines of tech articles. While not maintaining a double life as an aspiring writer by day and indie game dev by night, you’ll find him sat in a corner somewhere muttering to himself about microtransactions or hunting down promising indie games on Twitter.