Skip to main content

Programmer trains artificial intelligence to draw faces from text descriptions

T2F training time lapse

Programmer Animesh Karnewar wanted to know how characters described in books would appear in reality, so he turned to artificial intelligence to see if it could properly render these fictional people. Called T2F, the research project uses a generative adversarial network (GAN) to encode text and synthesize facial images.

Recommended Videos

Simply put, a GAN consists of two neural networks that argue with each other to produce the best results. For example, the job of network No. 1 is to fool network No. 2 into believing a rendered image is a real photograph while network No. 2 sets out to prove the alleged photo is just a rendered image. This back-and-forth process fine-tunes the rendering process until network No. 2 is eventually fooled.

Please enable Javascript to view this content

Karnewar started the project using a dataset called Face2Text provided by researchers at the University of Copenhagen, which contains natural language descriptions for 400 random images.

“The descriptions are cleaned to remove reluctant and irrelevant captions provided for the people in the images,” he writes. “Some of the descriptions not only describe the facial features, but also provide some implied information from the pictures.”

While the results stemming from Karnewar’s T2F project aren’t exactly photorealistic, it’s a start. The video embedded above shows a time-lapsed view of how the GAN was trained to render illustrations from text, starting with solid blocks of color and ending with rough but identifiable pixilated renderings.

“I found that the generated samples at higher resolutions (32 x 32 and 64 x 64) has more background noise compared to the samples generated at lower resolutions,” Karnewar explains. “I perceive it due to the insufficient amount of data (only 400 images).”

The technique used to train the adversarial networks is called “Progressive Growing of GANs,” which improves quality and stability over time. As the video shows, the image generator starts from an extremely low resolution. New layers are slowly introduced into the model, increasing the details as the training progresses over time.

“The Progressive Growing of GANs is a phenomenal technique for training GANs faster and in a more stable manner,” he adds. “This can be coupled with various novel contributions from other papers.”

Image used with permission by copyright holder

In a provided example, the text description illustrates a woman in her late 20s with long brown hair swiped over to one side, gentle facial features and no make-up. She’s “casual” and “relaxed.” Another description illustrates a man in his 40s with an elongated face, a prominent nose, brown eyes, a receding hairline and a short mustache. Although the end results are extremely pixelated, the final renders show great progress in how A.I. can generate faces from scratch.

Karnewar says he plans to scale out the project to integrate additional datasets such as Flicker8K and Coco captions. Eventually, T2F could be used in the law enforcement field to identify victims and/or criminals based on text descriptions, among other applications. He’s open to suggestions and contributions to the project.

To access the code and contribute, head to Karnewar’s repository on Github here.

Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
ChatGPT just got a bump to its coding powers
ChatGPT collaborating with Notion

For its penultimate 12 Days of OpenAI announcement, the company revealed a trio of updates to ChatGPT's app integration on Thursday, which should make using the AI in conjunction with other programs on your desktop less of a chore.

OpenAI unveiled ChatGPT's ability to collaborate with select developer-focused macOS apps, specifically VS Code, Xcode, TextEdit, Terminal, and iTerm2, back in November. Rather than needing to copy and paste code into ChatGPT, this feature allows the chatbot to pull specified content from the coding app as you enter your text prompt. ChatGPT, however, cannot generate code directly into the app, as Cursor or GitHub Copilot are able to.

Read more
Here’s why some PC gamers shouldn’t install the latest Windows 11 update
Overwatch 2 running on the LG OLED 27 gaming monitor.

The latest Windows 11 update, codenamed 24H2, has been a troubled rollout for Microsoft, but one thing's been clear from the beginning: PC gamers should wait to install it. Let's add another issue to the list, shall we?

As spotted by Windows Latest, Microsoft has confirmed in an update to its Windows 11 24H2 problems page, that Windows 11 24H2 is causing issues with its Auto HDR feature. The result of the bug is that incorrect colors are being displayed or, even worse, are breaking games entirely and causing them to not be responsive.

Read more
Someone just got the Intel B570 GPU a month in advance — and it works
ASRock's Arc B570 Challenger GPU.

Although Intel's Arc B580 is already here, the B570 is only set to launch on January 16. However, a German retailer listed the card well ahead of time and, surprisingly, one B570 actually shipped to a customer. The B580 is one of the best graphics cards for budget-conscious gamers, but how will the B570 compare?

Early listings and preorders happen shockingly often. For example, yesterday we found an RTX 5090 PC priced at well over $6,000. However, those listings often don't amount to much, and the items don't ship until their designated release dates -- but not this time.

Read more