Google Deepmind develops most realistic sounding AI yet

By Jon Martindale Published September 9, 2016

neural networks explain themselves avaexmachina — A24

While subtle, one of the biggest advances portrayed in movies like Her or Ex Machina was that the AI began to really sound like a fellow human. And in the realm of real-life tech, Google’s AI focus in recent years has similarly been to make computers sound more like us. And they’re getting much better at it.

The latest development to come out of Google’s Deepmind AI is called WaveNet and it samples different parts of human speech and models its own waveforms after the way they sound. It’s not perfect yet, but we’re definitely getting closer to voices that sound like they come from a person’s mouth, rather than from a computer’s speaker.

Recommended Videos

While it still sounds strange, the new AI speech certainly flows better than the kinds of responses you’ll get from Siri or Cortana, which chop up human speech and paste it back together in a way that makes individual pronunciations correct, but the flow of the speech is completely off. (That technique is known as concatenative text to speech, just so you know.)

Please enable Javascript to view this content

The WaveNet option flows much better because it uses something called parametric text to speech, which generates it from scratch. Where it differs from traditional uses of that technique though is that Google’s AI models its audio on the waveforms of real human voices.

That’s difficult, because typically there are around 16,000 potential voice samples to be taken with every second of speech — that takes a lot of processing power to handle. To cut back on that, WaveNet uses a prediction engine to estimate what sample should come next in natural speech, using everything that has gone before as a guide.

The results are impressive. To give you a comparison, here’s a classic concatenative text-to-speech system:

That sounds like the sort of digital assistant voices we’re become used to in recent years. But here’s the new WaveNet system that Google has developed:

The cadence of the speech is much more realistic and though there is a general fuzziness to the audio, it’s not hard to imagine that being cleaned up post development.

The process can even be used to simulate different kinds of voices, for example, male and female:

The only problem now is that even though its predictions reduce the amount of required processing for this technique, it still takes too much processing to imagine standard smartphone hardware being capable of doing it in real time. At least for now.

For more information on these techniques, Google’s blog post offers a lot more detail and samples and it even posted a couple of papers on it here.

Topics

Evergreen writer

Jon Martindale is a freelance evergreen writer and occasional section coordinator, covering how to guides, best-of lists, and…

Computing

YouTube’s new AI music remixer could let you swap genres

Musicians could soon be able to remix the songs that they upload to YouTube thanks to an experimental AI tool currently rolling out to select content creators.

The new tool is built atop YouTube's Dream Track, which was released last year and enables users to compose songs based on text prompts and by using prerecorded vocals. Charli XCX, Demi Lovato, John Legend, Sia, T-Pain, and Charlie Puth have all signed on for the use of their vocal likenesses.

Computing

Google Gemini arrives on iPhone as a native app

the Google extensions feature on iPhone

Google announced Thursday that it has released a new native Gemini app for iOS that will give iPhone users free, direct access to the chatbot without the need for a mobile web browser.

The Gemini mobile app has been available for Android since February, when the platform transitioned from the older Bard branding. However, iOS users could only access the AI on their phones through either the mobile Google app or via a web browser. This new app provides a more streamlined means of chatting with the bot as well as a host of new (to iOS) features.

Computing

ChatGPT Search is here to battle both Google and Perplexity

The ChatGPT Search icon on the prompt window

ChatGPT is receiving its second new search feature of the week, the company announced on Thursday. Dubbed ChatGPT Search, this tool will deliver real-time data from the internet in response to your chat prompts.

ChatGPT Search appears to be both OpenAI's answer to Perplexity and a shot across Google's bow.