• Ben's Bites
  • Posts
  • OpenAI's new tool can create realistic voices

OpenAI's new tool can create realistic voices

OpenAI is testing our patience with another tease. We are moving beyond text, images, and even videos to talk about audio. OpenAI is previewing a model that creates custom voices in a snap. It’s just a preview and we won’t get to use it anytime soon.

What's going on here?

OpenAI shows a trailer of their custom voice creation tool called Voice Engine.

What does this mean?

This new model, named Voice Engine, can recreate anyone’s voice from just a 15s audio sample. And the generated speech sounds like the real deal. Wild, right? OpenAI is testing multiple use cases of Voice Engine with select partners, including

  • Age of Learning to provide reading assistance and personalised lessons to children.

  • HeyGen to translate videos/podcasts into multiple languages with the original speaker's accent.

  • Dimagi to improve the training of healthcare workers in their native languages.

  • Livox to create unique and customised synthetic voices for non-verbal individuals.

  • Norman Prince Neurosciences Institute to help patients recover their voices.

We’re kinda used to OpenAI dropping bangers now but here’s the truly insane detail: OpenAI developed this model in late 2022. They’ve been sitting on it for a year and a half, testing with these partners.

They have used Voice Engine to build the preset voices in their TTS API and ChatGPT’s Read Aloud feature. They've implemented some safeguards like watermarking the audio to track its origins. But voice impersonation is a serious risk, so they're being cautious about a wider release where users could create new voices on their own.

Why should I care?

As impressive as cloning someone's voice from a short clip is, it opens a whole can of worms we need to be ready for. Just think—voice authentication for banking could be easily bypassed if someone has a realistic synthetic version of your voice. AI voice models will only get more convincing, so we also need to train ourselves to not believe everything we hear.

OpenAI isn’t alone in creating realistic voices though. Play.ht and ElevenLabs are two leading startups that have voice-cloning tech available for use. Their results are already good (similar to OpenAI’s samples) but often need more data and tuning to get that match. OpenAI’s claim is doing the same but with just 15 seconds of sample audio. Do you take OpenAI at its word?

Reply

or to participate.