Skip to main content

Google is learning to differentiate between your voice and your friend’s

Looking to Listen: Stand-up

We may be able to pick out our best friend’s or our mother’s voice from a crowd, but can the same be said for our smart speakers? For the time being, the answer may be “no.” Smart assistants aren’t always right about who’s speaking, but Google is looking to change that with a pretty elegant solution.

Recommended Videos

Thanks to new research detailed in a paper titled, “Looking to Listen at the Cocktail Party,” Google researchers explain how a new deep learning system is able to identify voices simply by looking at people’s faces as they speak.

“People are remarkably good at focusing their attention on a particular person in a noisy environment, mentally “muting” all other voices and sounds,” Inbar Mosseri and Oran Lang, software engineers at Google Research noted in a blog post. And while this ability is innate to human beings, “automatic speech separation — separating an audio signal into its individual speech sources — while a well-studied problem, remains a significant challenge for computers.”

Mosseri and Lang, however, have created a deep learning audio-visual model capable of isolating speech signals from a variety of other auditory inputs, like additional voices and background noise. “We believe this capability can have a wide range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where there are multiple people speaking,” the duo said.

So how did they do it? The first step was training the system to identify individual voices (paired with their faces) speaking uninterrupted in an aurally clean environment. The researchers presented the system with about 2,000 hours of video, all of which featured a single person in the camera frame with no background interference. Once this was complete, they began to add virtual noise — like other voices — in order to teach its A.I. system to differentiate among audio tracks, and thereby allowing the system to identify which track is which.

Ultimately, the researchers were able to train the system to “split the synthetic cocktail mixture into separate audio streams for each speaker in the video.” As you can see in the video, the A.I. can identify the voices of two comedians even as they speak over one another, simply by looking at their faces.

“Our method works on ordinary videos with a single audio track, and all that is required from the user is to select the face of the person in the video they want to hear, or to have such a person be selected algorithmically based on context,” Mosseri and Lang wrote.

We’ll just have to see how this new methodology is ultimately implemented in Google products.

Lulu Chang
Former Digital Trends Contributor
Fascinated by the effects of technology on human interaction, Lulu believes that if her parents can use your new app…
Google’s AI detection tool is now available for anyone to try
Gemini running on the Google Pixel 9 Pro Fold.

Google announced via a post on X (formerly Twitter) on Wednesday that SynthID is now available to anybody who wants to try it. The authentication system for AI-generated content embeds imperceptible watermarks into generated images, video, and text, enabling users to verify whether a piece of content was made by humans or machines.

“We’re open-sourcing our SynthID Text watermarking tool,” the company wrote. “Available freely to developers and businesses, it will help them identify their AI-generated content.”

Read more
Your Google TV can now control smart home devices
The Home Panel on Google TV Streamer.

In late September, Google announced a new feature for Google TV called the Home Panel that would make it easier to control all of your (compatible) smart home devices from a single location. The feature first appeared on the Google TV Streamer and then later on Chromecast, but it has now rolled out to Google TVs from other companies including Hisense, TCL, and others.

The Home Panel offers a lot of utility. It shows your lights' current brightness level, the volume level of speakers, and even live streams from security cameras. The demo video Google has on its blog shows that the user can even adjust the thermostat. All of this is done through the remote, so you don't even have to get up off the couch.

Read more
What is Gemini Advanced? Here’s how to use Google’s premium AI
Google Gemini on smartphone.

Google's Gemini is already revolutionizing the way we interact with AI, but there is so much more it can do with a $20/month subscription. In this comprehensive guide, we'll walk you through everything you need to know about Gemini Advanced, from what sets it apart from other AI subscriptions to the simple steps for signing up and getting started.

You'll learn how to craft effective prompts that yield impressive results and stunning images with Gemini's built-in generative capabilities. Whether you're a seasoned AI enthusiast or a curious beginner, this post will equip you with the knowledge and techniques to harness the power of Gemini Advanced and take your AI-generated content to the next level.
What is Google Gemini Advanced?

Read more