ChatGPT’s latest model may be a regression in performance

By Andrew Tarantola Published November 21, 2024

chatGPT on a phone on an encyclopedia — Shantanu Kumar / Pexels

According to a new report from Artificial Analysis, OpenAI’s flagship large language model for ChatGPT, GPT-4o, has significantly regressed in recent weeks, putting the state-of-the-art model’s performance on par with the far smaller, and notably less capable, GPT-4o-mini model.

This analysis comes less than 24 hours after the company announced an upgrade for the GPT-4o model. “The model’s creative writing ability has leveled up–more natural, engaging, and tailored writing to improve relevance & readability,” OpenAI wrote on X. “It’s also better at working with uploaded files, providing deeper insights & more thorough responses.” Whether those claims continue to hold up is now being cast in doubt.

Recommended Videos

“We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o,” the Artificial Analysis announced via an X post on Thursday, noting that the model’s Artificial Analysis Quality Index decreased from 77 to 71 (and is now equal to that of GPT-4o mini).

What’s more, GPT-4o’s performance on the GPQA Diamond benchmark decreased from 51% to 39% while its MATH benchmarks decreased from 78% to 69%.

Simultaneously, the researchers discovered more than a doubling in the speed increase of the model’s responses, accelerating from around 80 output tokens per second to roughly 180 tokens/s. “We have generally observed significantly faster speeds on launch day for OpenAI models (likely due to OpenAI provisioning capacity ahead of adoption), but previously have not seen a 2x speed difference,” the researchers wrote.

Wait – is the new GPT-4o a smaller and less intelligent model?

We have completed running our independent evals on OpenAI’s GPT-4o release yesterday and are consistently measuring materially lower eval scores than the August release of GPT-4o.

GPT-4o (Nov) vs GPT-4o (Aug):
➤… pic.twitter.com/gjY2pBFuUv

— Artificial Analysis (@ArtificialAnlys) November 21, 2024

“Based on this data, we conclude that it is likely that OpenAI’s Nov 20th GPT-4o model is a smaller model than the August release,” they continued. “Given that OpenAI has not cut prices for the Nov 20th version, we recommend that developers do not shift workloads away from the August version without careful testing.”

GPT-4o was first released in May 2024 to surpass the existing GPT-3.5 and GPT-4 models. GPT-4o offers state-of-the-art benchmark results in voice, multilingual, and vision tasks, according to OpenAI, making it ideal for advanced applications like real-time translation and conversational AI.

Topics

Andrew Tarantola

Computing Writer

Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…

Computing

ChatGPT just got a bump to its coding powers

ChatGPT collaborating with Notion

For its penultimate 12 Days of OpenAI announcement, the company revealed a trio of updates to ChatGPT's app integration on Thursday, which should make using the AI in conjunction with other programs on your desktop less of a chore.

OpenAI unveiled ChatGPT's ability to collaborate with select developer-focused macOS apps, specifically VS Code, Xcode, TextEdit, Terminal, and iTerm2, back in November. Rather than needing to copy and paste code into ChatGPT, this feature allows the chatbot to pull specified content from the coding app as you enter your text prompt. ChatGPT, however, cannot generate code directly into the app, as Cursor or GitHub Copilot are able to.

Computing

Yes, it’s real: ChatGPT has its own 800 number

1-800-chatgpt

On the 10th of its "12 Days of OpenAI" media event, the company announced that it has set up an 800 number (1-800-ChatGPT, of course) where anyone in the U.S. with a phone line can dial in and speak with the AI via Advanced Voice Mode. Because why not.

“[The goal of] OpenAI is to make artificial general intelligence beneficial to all of humanity, and part of that is making it as accessible as possible to as many people as we can,” the company's chief product officer, Kevin Weil, said during the Wednesday live stream. “Today, we’re taking the next step and bringing ChatGPT to your telephone.”

Computing

OpenAI opens up developer access to the full o1 reasoning model

On the ninth day of OpenAI's holiday press blitz, the company announced that it is releasing the full version of its o1 reasoning model to select developers through the company's API. Until Tuesday's news, devs could only access the less-capable o1-preview model.

According to the company, the full o1 model will begin rolling out to folks in OpenAI's "Tier 5" developer category. Those are users that have had an account for more than a month and who spend at least $1,000 with the company. The new service is especially pricey for users (on account of the added compute resources o1 requires), costing $15 for every (roughly) 750,000 words analyzed and $60 for every (roughly) 750,000 words generated by the model. That's three to four times the cost of performing the same tasks with GPT-4o.