After months of speculation and anticipation, OpenAI has released the production version of its advanced reasoning model, Project Strawberry, which has been renamed “o1.” It is joined by a “mini” version (just as GPT-4o was) that will offer faster and more responsive interactions at the expense of leveraging a larger knowledge base.
It appears that o1 offers a mixed bag of technical advancements. It’s the first in OpenAI’s line of reasoning models designed to use humanlike deduction to answer complex questions on subjects — including science, coding, and math — faster than humans can.
For example, during testing, o1 was fed a qualifying exam for the International Mathematics Olympiad. While its predecessor, GPT-4o, only managed to correctly solve 13% of the problems presented, o1 got 83% of them right. In an online Codeforces competition, o1 scored in the 89th percentile. What’s more, o1 can respond to queries that stumped previous models (like, “which is bigger, 9.11 or 9.9?”). However, the company makes clear that this release is only a preview of the neophyte model’s full capabilities.
The new o1 “has been trained using a completely new optimization algorithm and a new training dataset specifically tailored for it,” OpenAI’s research lead, Jerry Tworek, told The Verge. Using a combination of reinforcement learning and “chain of thought” reasoning, o1 reportedly returns more accurate inferences than its predecessor. “We have noticed that this model hallucinates less,” Tworek said, however, “we can’t say we solved hallucinations.”
Both ChatGPT-Plus and Teams subscribers will be able to test out o1 and o1-mini beginning today. Enterprise and Edu subscribers should have access by next week.
The company says that o1-mini will eventually become available to free-tier users, though it did not specify a timeline. Developers will notice a steep increase in the API pricing for o1, compared to GPT-4o. Access to o1 will cost $15 per million input tokens (compared to $5 per million for GPT-4o) and $60 per million output tokens, four times more than 4o’s $5 per million fee. The real question is whether the new model thinks the word “strawberry” contains two R’s or three.