Skip to main content

Microsoft’s new bot can draw a photo-realistic bird based on text descriptions

Microsoft
Image used with permission by copyright holder

Microsoft’s research labs created a new artificial intelligence, or bot, that can draw any image you want based on simple descriptions. The company says this bot can draw anything in pixel form stemming from caption-like text descriptions you provide. And although text-to-image creation isn’t anything new, Microsoft’s “drawing bot” focuses on captions as image descriptors to produce an image quality that is claimed to be three times better than other state-of-the-art technologies.  

“The technology, which the researchers simply call the drawing bot, can generate images of everything from ordinary pastoral scenes, such as grazing livestock, to the absurd, such as a floating double-decker bus,” Microsoft states. “Each image contains details that are absent from the text descriptions, indicating that this artificial intelligence contains an artificial imagination.” 

Recommended Videos

Microsoft’s drawing bot merges two components of artificial intelligence: Natural-language processing and computer vision. The research project started with a bot that could generate text captions from photos. The researchers then advanced the project to answer human-generated questions about images, such as identifying a location, the object in focus, and so on. 

Please enable Javascript to view this content

But actually drawing an image is a huge step. While the bot can generate components based on text descriptors, it must “imagine” all the other missing pieces of the picture. Thus, if you tell the bot to draw a yellow bird with black wings, it has four descriptors, but must pull the remaining parts from data it acquired from previous drawings, photos, and more. In other words, knowledge obtained through machine-based learning. 

Microsoft’s bot relies on a generative adversarial network (GAN). Just imagine two teams of computers: One side must render an image to fool the other team into believing it’s an actual photograph. Both teams go back and forth, with the first saying the image is real, and the second saying “nuh-uh,” disproving the claim. The goal, obviously, is to render an image that finally fools the second team. 

In this case, the first team renders an image derived from text-based descriptions and the second team will disprove its “authenticity” as an actual photograph until the first team correctly renders the image. Microsoft first fed its GAN with paired images and captions so that it could understand that it needs to draw a bird based on that single word. 

From there, Microsoft continued to build the knowledge base with paired images and captions consisting of multiple traits, such as black wings and a red belly. But Microsoft says it’s not using just any GAN, but one that targets tiny details so the bot can produce photo-realistic results. Microsoft dubs it as an attentional GAN, or AttnGAN. 

“As humans draw, we repeatedly refer to the text and pay close attention to the words that describe the region of the image we are drawing,” the company says. “[AttnGAN] does this by breaking up the input text into individual words and matching those words to specific regions of the image.” 

You can read Microsoft’s research paper describing its AttnGAN here. 

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
Prepare your wallet — this RTX 5090 PC costs over $6,000
Acer Predator Orion 7000 sitting on a table.

It's safe to say that no one expects Nvidia's best graphics cards to be cheap, but wow, these leaked listings are something else. Otto.de, a German retailer, briefly listed two Acer Predator Orion gaming PCs equipped with the RTX 5090 and the RTX 5080, and the prices are pretty crazy. The PC that comes with the RTX 5090 was priced at 5,999 euros, or around $6,240.

These listings were taken down shortly after they appeared, but VideoCardz snapped some screenshots before it was too late. Both seem to be newer versions of the Acer Predator Orion, and are equipped with Nvidia's upcoming RTX 50-series graphics cards and Intel's Core Ultra 200 series CPUs.

Read more
Intel’s promised Arrow Lake autopsy details up to 30% loss in performance
The Core Ultra 9 285K socketed into a motherboard.

Intel's Arrow Lake CPUs didn't make it on our list of the best processors when they released earlier this year. As you can read in our Core Ultra 9 285K review, Intel's latest desktop offering struggled to keep pace with last-gen options, particularly in games, and showed strange behavior in apps like Premiere Pro. Now, Intel says it has fixed the issues with its Arrow Lake range, which accounted for up to a 30% loss in real-world performance compared to Intel's in-house testing.

The company identified five issues with the performance of Arrow Lake, four of which are resolved now. The latest BIOS and Windows Updates (more details on those later in this story) will restore Arrow Lake processors to their expected level of performance, according to Intel, while a new firmware will offer additional performance improvements. That firmware is expected to release in January, pushing beyond the baseline level of performance Intel expected out of Arrow Lake.

Read more
You can get this 40-inch LG UltraWide 5K monitor at $560 off if you hurry
A woman using the LG UltraWide 40WP95C-W 5K monitor.

If you need a screen to go with the upgrade that you made with desktop computer deals, and you're willing to spend for a top-of-the-line display, then you may want to set your sights on the LG 40WP95C-W UltraWide curved 5K monitor. From its original price of $1,800, you can get it for $1,240 from Walmart for huge savings of $560, or for $1,275 from Amazon for a $525 discount. You should complete your purchase quickly if you're interested though, as there's no telling when the offers for this monitor will expire.

Why you should buy the LG 40WP95C-W UltraWide curved 5K monitor
5K monitors are highly recommended for serious creative professionals, such as graphic designers and filmmakers, for their extremely sharp details and precise colors, and the LG 40WP95C-W UltraWide curved 5K monitor is an excellent choice. We've tagged it as the best ultrawide 5K monitor in our roundup of the best 5K monitors, with its huge 40-inch curved screen featuring 5120 x 2160 resolution, 98% coverage of the DCI-P3 spectrum, and support for HDR10 providing striking visuals that you won't enjoy from most of the other options in the market.

Read more