Nvidia’s new AI model makes music from text and audio prompts

By Andrew Tarantola Published November 25, 2024

Nvidia

Nvidia has released a new generative audio AI model that is capable of creating myriad sounds, music, and even voices, based on the user’s simple text and audio prompts.

Dubbed Fugatto (aka Foundational Generative Audio Transformer Opus 1) the model can, for example, create jingles and song snippets based solely on text prompts, add or remove instruments and vocals from existing tracks, modify both the accent and emotion of a voice, and “even let people produce sounds never heard before,” per Monday’s announcement post.

Recommended Videos

“We wanted to create a model that understands and generates sound like humans do,” said Rafael Valle, a manager of applied audio research at Nvidia. “Fugatto is our first step toward a future where unsupervised multitask learning in audio synthesis and transformation emerges from data and model scale.”

The company notes that music producers could use the AI model to rapidly prototype and vet song ideas in various musical styles with varying arrangements, or add effects and additional layers to existing tracks. The model could also be leveraged to adapt and localize the music and voiceovers of an existing ad campaign, or adjust the music of a video game on the fly as the player plays through a level.

The model is even capable of generating previously unheard sounds like barking trumpets or meowing saxophones. In doing so, it uses a technique called ComposableART to combine the instructions it learned during training.

“I wanted to let users combine attributes in a subjective or artistic way, selecting how much emphasis they put on each one,” Nvidia AI researcher Rohan Badlani wrote in the announcement post. “In my tests, the results were often surprising and made me feel a little bit like an artist, even though I’m a computer scientist.”

The Fugatto model itself uses 2.5 billion parameters and was trained on 32 H100 GPUs. Audio AI’s like this are becoming increasingly common. Stability AI unveiled a similar system in April that can generate tracks up to three minutes in length while Google’s V2A model can generate “an unlimited number of soundtracks for any video input.”

YouTube recently released an AI music remixer that generates a 30-second sample based on the input song and the user’s text prompts. Even OpenAI is experimenting in this space, having released an AI tool in April that needs just 15 seconds of sample audio in order to fully clone a user’s voice and vocal patterns.

Topics

Tech News

Andrew Tarantola

Computing Writer

Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…

Computing

Is AI already plateauing? New reporting suggests GPT-5 may be in trouble

A person sits in front of a laptop. On the laptop screen is the home page for OpenAI's ChatGPT artificial intelligence chatbot.

OpenAI's next-generation Orion model of ChatGPT, which is both rumored and denied to be arriving by the end of the year, may not be all it's been hyped to be once it arrives, according to a new report from The Information.

Citing anonymous OpenAI employees, the report claims the Orion model has shown a "far smaller" improvement over its GPT-4 predecessor than GPT-4 showed over GPT-3. Those sources also note that Orion "isn’t reliably better than its predecessor [GPT-4] in handling certain tasks," specifically coding applications, though the new model is notably stronger at general language capabilities, such as summarizing documents or generating emails.

Computing

Elon Musk reportedly will blow $10 billion on AI this year

Elon Musk at Tesla Cyber Rodeo.

Between Tesla and xAI, Elon Musk's artificial intelligence aspirations have cost some $10 billion dollars in bringing training and inference compute capabilities online this year, according to a Thursday post on X (formerly Twitter) by Tesla investor Sawyer Merritt.

"Tesla already deployed and is training ahead of schedule on a 29,000 unit Nvidia H100 cluster at Giga Texas – and will have 50,000 H100 capacity by the end of October, and ~85,000 H100 equivalent capacity by December," Merritt noted.

Computing

Zoom debuts its new customizable AI Companion 2.0

overhead shot of a person taking a zoom meeting at their desk

Zoom unveiled its AI Companion 2.0 during the company's Zoomtopia 2024 event on Wednesday. The AI assistant is incorporated throughout the Zoom Workplace app suite and is promised to "deliver an AI-first work platform for human connection."

While Zoom got its start as a videoconferencing app, the company has expanded its product ecosystem to become an "open collaboration platform" that includes a variety of communication, productivity, and business services, both online and in physical office spaces. The company's AI Companion, which debuted last September, is incorporated deeply throughout Zoom Workplace and, like Google's Gemini or Microsoft's Copilot, is designed to automate repetitive tasks like transcribing notes and summarizing reports that can take up as much as 62% of a person's workday.