Skip to main content

Google’s new AI generates audio soundtracks from pixels

An AI generated wolf howling
Google Deep Mind

Deep Mind showed off the latest results from its generative AI video-to-audio research on Tuesday. It’s a novel system that combines what it sees on-screen with the user’s written prompt to create synced audio soundscapes for a given video clip.

The V2A AI can be paired with vide -generation models like Veo, Deep Mind’s generative audio team wrote in a blog post, and can create soundtracks, sound effects, and even dialogue for the on-screen action. What’s more, Deep Mind claims that its new system can generate “an unlimited number of soundtracks for any video input” by tuning the model with positive and negative prompts that encourage or discourage the use of a particular sound, respectively.

The system works by first encoding and compressing the video input, which the diffusion model then leverages to iteratively refine the desired audio effects from background noise based on the user’s optional text prompt and from the visual input. This audio output is finally decoded and exported as a waveform that can then be recombined with the video input.

The best part is that the user doesn’t have to go in and manually (read: tediously) sync the audio and video tracks, as the V2A system does it automatically. “By training on video, audio and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts,” the Deep Mind team wrote.

The system is not yet perfected, however. For one, the output audio quality is dependent on the fidelity of the video input and the system gets tripped up when video artifacts or other distortions are present in the input. According to the Deep Mind team, syncing dialogue to the audio track remains an ongoing challenge.

V2A Claymation family

“V2A attempts to generate speech from the input transcripts and synchronize it with characters’ lip movements,” the team explained. “But the paired vide- generation model may not be conditioned on transcripts. This creates a mismatch, often resulting in uncanny lip-syncing, as the video model doesn’t generate mouth movements that match the transcript.”

The system still needs to undergo “rigorous safety assessments and testing” before the team will consider releasing it to the public. Every video and soundtrack generated by this system will be affixed with Deep Mind’s SynthID watermarks. This system is far from the only audio-generating AI currently on the market. Stability AI dropped a similar product just last week while ElevenLabs released their sound effects tool last month.

Editors' Recommendations

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
Best Prime Day MacBook deals: What to expect in 2024
An open MacBook Pro on a table.

Prime Day 2024 is officially confirmed for July 16 and 17, so if you're thinking about buying a MacBook from this year's Prime Day deals, you won't have to wait for long. Apple's laptops carry premium prices for several good reasons, so there's always high demand for discounts. You'll get a lot of opportunities with the upcoming Prime Day MacBook deals, but since you'll be racing against other shoppers in scoring the best bargains, we recommended that you prepare yourself for the shopping event with the tips we've gathered below.
Today's best MacBook deals

There's still no exact date when Prime Day will run in July, as there's no official announcement yet from Amazon. We're looking at a few more weeks at least before the shopping holiday arrives with its MacBook deals, so you're going to have to exercise some patience. However, if you need a new MacBook as soon as possible, you should know that there are offers that you can take advantage of right now. We've rounded our favorite picks below -- you should know that these prices may go lower on Prime Day, but it can't be helped if you must have your new MacBook immediately.

Read more
The Microsoft AI CEO just dropped a huge hint about GPT-5
A photo of Mustafa Suleyman.

The timeline on GPT-5 continues to be a moving target, but a recent interview with Microsoft AI CEO Mustafa Suleyman sheds some light on what GPT-5 and even what its successor will be like.

Mustafa Suleyman on Defining Intelligence

Read more
ChatGPT finally provides an update on its anticipated new Voice Mode
A screenshot from the OpenAI Spring Update showing three representatives on stage against a screen.

ChatGPT first announced its advanced Voice Mode back in May during its Spring Update, and it's been rather quiet on the rollout. But in a new post on X (formerly Twitter), OpenAI has delivered an update on the situation, indicating when it will finally become available more broadly.

According to the announcement, an official rollout won't happen until "this fall," clarifying that "exact timelines depend on meeting our high safety and reliability bar."

Read more