Lyrebird Can Recreate Any Voice Using Just One Minute of Sample Audio

Megalith

24-bit/48kHz
Staff member
Joined
Aug 20, 2006
Messages
13,000
I’m a cop, you idiot! If that Arnold soundboard is getting a little old, Lyrebird has introduced a new algorithm that will make prank calling fun again. All you need is 60 seconds of someone’s voice, and like magic, you can create faux voice clips of almost anybody. It is supposedly so advanced that it can even infuse speech with emotion, letting customers make voices sound angry, sympathetic, or stressed out. Thanks to Kyle for this one.

…a Canadian AI startup named Lyrebird unveiled its first product: a set of algorithms the company claims can clone anyone’s voice by listening to just a single minute of sample audio. A few years ago this would have been impossible, but the analytic prowess of machine learning has proven to be a perfect fit for the idiosyncrasies of human speech. Using artificial intelligence, companies like Google have been able to create incredibly life-like synthesized voices, while Adobe has unveiled its own prototype software called Project VoCo that can edit human speech like Photoshop tweaks digital images.
 
I'm glad it's not TOO realistic yet. Still has a bit of a Microsoft Sam style pause between words during speech.
 
Yeah it's not perfect. But I didn't need any hints to figure out who the voices were supposed to be. And yes, the synthesis had the fundamentals of their speech patterns.

Creepy. Again. Just like the 3D talking heads the other company was showing off a year or so ago that could recreate Putin's head well enough it would almost pass in a news studio interview.

So what happens when we have both a talking head in a studio and a recreated voice and we can't trust that either of them are the real person? With only minimal human tweaking both these techs would seem to be able to do a prerecorded segment that would be good enough to fool.

Just like with Quantum computing and the race to create encryption that can stand up to it, we are going to have to create new standards for watermarks or recording methods for interviews that we can't duplicate virtually. Something to provide some kind of comfort that what we both see and hear actually came from that person. Unless you are willing to live in just utter fantasy land denial it is going to be insanity inducing to have no idea if anything you see and hear every day is real or not.

I already get very angry at the telephone solicitation I get at work every day. They use the AI voices with very accurate audio to make the start of the conversation seem completely natural then a couple of sentences in you realize it's not a human on the other end. The level of affront and insult I feel at being intentionally mislead about whether I'm talking to a human or a machine is very large. I love technology, but my anger at it being used that way knows no bounds.
 
I'll have to try this out. I always get anxious about calling out of work so now I can use this to do it for me.

Emails all the way :) you can even prepare one with nice words and save in draft for the good day lol
 
So what happens when we have both a talking head in a studio and a recreated voice and we can't trust that either of them are the real person? With only minimal human tweaking both these techs would seem to be able to do a prerecorded segment that would be good enough to fool.
My mom listens to NOAA weather radio, and when they switched to a computer voice for the current weather about 15 years ago, my mom thought it was just someone with a very thick Norwegian accent. (not an uncommon accent in Minnesota, even today)

I still give her a hard time about "listening to the Norwegian" whenever she has it playing.
 
My mom listens to NOAA weather radio, and when they switched to a computer voice for the current weather about 15 years ago, my mom thought it was just someone with a very thick Norwegian accent. (not an uncommon accent in Minnesota, even today)

I still give her a hard time about "listening to the Norwegian" whenever she has it playing.

At least not Swedish accent

watch
 
Wow.. President Obama has commented on this!!

That's some sick Die Hard 4 audio editing going on right there, damn. :p



As for the Lyrebird technology, I can think of some uses for such a thing IF (that's a big IF) they can work out the kinks and get it a bit more realistic, perhaps with longer samples of the speaker's voice I would guess. While it is so obviously artificial it's not going to be defeating any voiceprint tech I've had experience with over the years, that stuff is fairly advanced now so this fakery won't pass muster for such hardware.

But the idea is pretty cool overall so yeah, I could definitely make use of it myself.
 
Last edited by a moderator:
This tech has been around for several years, mostly private though. Lucent / Bell Labs developed it for the government. What purpose? I can only image what the government would do with it.
 
Back
Top