Researchers Develop Super Accurate Lipreading Software

HardOCP News

[H] News
Joined
Dec 31, 1969
Messages
0
Researchers from the University of Oxford Computer Science Department, with funding from Google DeepMind and NVIDIA, have developed a lipreading model that operates at sentence-level. According to the description, the software achieves an amazing 93.4% accuracy.
 
I am deeply skeptical that such accuracy levels would hold up in non-lab situations. I have read that the best human lip readers can achieve at best 50% accuracy, though I can't remember if that was live or studying video afterwards. There's just too much going on that's hidden deeper in the mouth and throat.
 
I am deeply skeptical that such accuracy levels would hold up in non-lab situations. I have read that the best human lip readers can achieve at best 50% accuracy, though I can't remember if that was live or studying video afterwards. There's just too much going on that's hidden deeper in the mouth and throat.
I don't know of any early technology that has held up in a non-lab situation very well. That's why you do real world testing before releasing it into the world.
 
I am deeply skeptical that such accuracy levels would hold up in non-lab situations. I have read that the best human lip readers can achieve at best 50% accuracy, though I can't remember if that was live or studying video afterwards. There's just too much going on that's hidden deeper in the mouth and throat.

Here is the thing, we have a hundred years worth of video now to show it. And we know what SHOULD be said. Then we also have show it millions of hours of youtube crap. A human can only read so many lips, a computer can be studying and improving 1% for every 1 million hours of lip reading watched. But if it can "watch" 10000 videos at once what's a million hours?
 
This could be great for real time lip movement in single player video games, and in multiplayer. Also for different languages instead of spending money animated lips they could react in real time to scripts said by actors or gamers.
 
I am deeply skeptical that such accuracy levels would hold up in non-lab situations. I have read that the best human lip readers can achieve at best 50% accuracy, though I can't remember if that was live or studying video afterwards. There's just too much going on that's hidden deeper in the mouth and throat.
The eyes and whole face and maybe the neck and body language will be tracked. All those details and the conversation will be processed for a coherent dialog. It doesn't have to just be the lips and it doesn't have to be in real time. Even if its only 50%. I can reconstruct most of the rest.
 
Back
Top