With chatbots and text-to-image generators taking the internet by storm, the next frontier of AI might be text-to-video generators.
Nvidia recently published a research paper called “High-Resolution Video Synthesis with Latent Diffusion Models” on its experiments at its Toronto AI Lab that details how it uses Stable Diffusion to create a tool that can make moving art results from text prompts.
The tech company showcased demos of the Latent Diffusion Models (LDMs), which use text to generate video clips without large amounts of computer processing, TechRadar noted.
The tool is able to generate GIF-style moving images that are approximately 4.7-second long videos at a 1,280 x 2,048 resolution. It is also capable of creating longer videos at a lower resolution of 512 x 1024, according to the research paper.
Having viewed a demo of the technology, TechRadar said the tool is likely ideal as a text-to-GIF generator at this point. The publication noted it could easily handle simple prompts such as a stormtrooper vacuuming on the beach or teddy bear is playing the electric guitar, high definition, 4K. Even so, the result still produced random artifacts and smudging in the GIFs, as are common on other regularly used AI tools such as Midjourney.
The publication believes longer videos still need a little more development before they hit prime time, but feels Nvidia will work quickly to get the technology ready. They might work well for stock libraries and similar purposes.
There are other companies experimenting with AI text-to-video generators. Google demoed its Phenaki generator, which allows longer prompts that produce 20-second clips. Another startup called Runway announced its second-generation video model last month, which is also based on Stable Diffusion. Its demo of the prompt the late afternoon sun peeking through the window of a New York City loft shows how you can add slight moving effects to still images.
Users also stand to benefit from the addition of AI in other programs, such as Adobe Firefly and Adobe Premiere Rush, according to TechRadar.
Some other companies, such as Narakeet and Lume5, market themselves as having text-to-video generators. However, many of these tools work more like PowerPoint presentations, putting together text, audio, images, and perhaps some already produced clips of video with prompts, as opposed to generating a unique work.