Skip to main content

DALL-E 3 could take AI image generation to the next level

DALL-E 2DALL-E 2 Image on OpenAI.
OpenAI

OpenAI might be preparing the next version of its DALL-E AI text-to-image generator with a series of alpha tests that have now been leaked to the public, according to the Decoder.

An anonymous leaker on Discord shared details about his experience, having access to the upcoming OpenAI image model being referred to as DALL-E 3. He first appeared in May, telling the interest-based Discord channel that he was part of an alpha test for OpenAI, trying out a new AI image model. He shared the images he generated at the time.

We've NEVER seen Image Generation This Good! | SNEAK PEAK

The May alpha test version had the ability to generate images of multiple aspect ratios inside the image model. YouTuber, MattVidPro AI then showcased several of the images that were generated in a 16:9 aspect ratio. This version also showed the model’s prowess for high-quality text production, which continues to be a pain point for rival models, even for top generators such as Stable Diffusion and Midjourney.

Some examples showcased images, such as text melded into a brick wall, a neon sign of words, a billboard sign in a city, a cake decoration, and a name etched into a mountain. The model maintains that DALL-E is good at generating people. One such image displayed a woman eating spaghetti at a party from a fisheye point of view.

The leaker returned to the Discord channel in mid-July with more details and new images. He claimed to be a part of a “closed alpha” test version that included approximately 400 subjects. He added that he was invited to the trial via email and was also included in the testing of the original DALL-E and DALL-E 2. This is what led to the conclusion that the alpha test might be for DALL-E 3, though it has not been confirmed.

The model has been updated considerably between May and July. The leaker has showcased this by sharing images generated based on the same prompt, showing how powerful DALL-E 3 has gotten over time. The prompt reads a painting of a pink jester giving a high five to a panda while in a cycling competition. The bikes are made of cheese and the ground is very muddy. They are driving in a foggy forest. The panda is angry.

The May alpha produces the general scene that hits most of the points of the prompt. There’s a little distortion in the hands connecting, and the wheels of the bikes are yellow as opposed to being made of cheese. However, the July alpha is far more detailed, with the pink jester and the panda clearly high-fiving and the bicycle wheels made of cheese in several generations.

Meanwhile, in Midjourney, the jester is missing from the scene, the pandas are on motorcycles instead of bicycles. There are roads, instead of mud. The pandas are happy instead of angry.

There are a host of DALL-E 3 July alpha image examples that show the potential of the model. However, with the alpha test being uncensored, the leaker noted that also has the potential to generate scenes of “violence and nudity or copyrighted material such as company logos.”

Some examples include a gory anime girl, a Game of Thrones character, a Grand Theft Auto V cover, a zombie Jesus eating a Subway sandwich, also suggesting mild gore, and Shrek being dug up from an archeological dig, among others.

MattVidPro AI noted that the image model generates images as if they’re supposed to be in a specific style.

DALL-E 2 launched in April 2022 but was heavily regulated with a waitlist due to its popularity and concerns about ethics and safety. The AI image generator became accessible to the public in September 2022.

Fionna Agomuoh
Fionna Agomuoh is a Computing Writer at Digital Trends. She covers a range of topics in the computing space, including…
AI image generation just took a huge step forward
Professor Dumbledore by a pool in Wes Anderson's Harry Potter.

We've been living with AI-generated images for a while now, but this week, some of the major players took some big steps forward. In particular, I'm talking about significant updates to Midjourney, Google's new model, and Grok.

Each company shows the technology evolving at different paces and in different directions. It's still very much an open playing field, and each company demonstrates just how far the advances have come.
Midjourney hits the web
An AI image generated in Midjourney. Channel/Midjourney

Read more
Grok 2.0 takes the guardrails off AI image generation
Elon Musk as Wario in a sketch from Saturday Night Live.

Elon Musk's xAI company has released two updated iterations of its Grok chatbot model, Grok-2 and Grok-2 mini. They promise improved performance over their predecessor, as well as new image-generation capabilities that will enable X (formerly Twitter) users to create AI imagery directly on the social media platform.

“We are excited to release an early preview of Grok-2, a significant step forward from our previous model, Grok-1.5, featuring frontier capabilities in chat, coding, and reasoning. At the same time, we are introducing Grok-2 mini, a small but capable sibling of Grok-2. An early version of Grok-2 has been tested on the LMSYS leaderboard under the name 'sus-column-r,'” xAI wrote in a recent blog post. The new models are currently in beta and reserved for Premium and Premium+ subscribers, though the company plans to make them available through its Enterprise API later in the month.

Read more
A modular supercomputer built to birth AGI could be online by next year
crystalline network nodes

AI startup SingularityNet is set to deploy "multi-level cognitive computing network" in the coming months. It is designed to host and train the models that will form the basis of an artificial general intelligence (AGI) capable of matching -- even potentially exceeding -- human cognition, the company announced on Monday.

Achieving AGI is widely viewed as the next major milestone in artificial intelligence development. While today's cutting-edge models like GPT-4o and Gemini 1.5 Pro are immensely powerful and can perform specific tasks at superhuman levels, they're incapable of applying those skills across disciplines. AGI, though still theoretical at this point, would be free of those limitations, and able to reason and learn on its own, regardless of the task.

Read more