Skip to main content

GPT-4o and Gemini 1.5 Pro just got beat in the AI race

a screenshot of claude 3.5 sonnet, with an 8-bit crab
Anthropic

There’s a new leader, technically, in the race for AI assistant dominance, and it’s Anthropic’s new Claude 3.5 Sonnet. The newly released model outperforms both Gemini 1.5 Pro and ChatGPT-4o across a spectrum of benchmark tests, the company announced on Thursday.

This new iteration of Sonnet is the first in Anthropic’s upcoming line of 3.5 models, and it significantly outperforms the more expansive Opus 3.0 model, and does so at a fraction of the larger model’s energy cost. Compute efficiency is becoming an increasingly important aspect of AI system design, especially as the cost of both powering and cooling AI data centers soars while the infrastructure pushes into the gigawatt range.

Claude 3.5 Sonnet for vision

“Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus,” the Anthropic team wrote in a blog post. “This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multistep workflows.”

The new model has reportedly set benchmark results across three standardized tests: graduate-level reasoning with GPQA, undergraduate-level knowledge with MMLU, and coding proficiency with HumanEval. It beat out Google’s Gemini 1.5 Pro, Meta’s Llama-400b, and OpenAI’s ChatGPT-4o, though not by any huge margin and typically only by a couple percentage points.

A table showing Claude 3.5 Sonnet's performance compared to other leading AI systems.
Anthropic

Sonnet 3.5 is being billed as Anthropic’s “strongest vision model yet. ” It’s capable of performing a number of vision-based tasks — like interpreting charts and graphs or transcribing text from imperfect image sources like screenshots or scanned receipts — more accurately than Opus 3.0. In fact, Sonnet 3.5 beat out Opus 3.0 by anywhere from 6 to 17 points across industry standard vision benchmarks. The new model is also reportedly much more competent at handling humor and can converse in a much more lifelike manner.

Sonnet will also be the first Anthropic AI to offer the Artifacts feature to users. Rather than generate images or code snippets directly into the flow of the conversation, Artifacts will create that content in a dedicated space to the side of the chat. This allows users to create “a dynamic workspace where they can see, edit, and build upon Claude’s creations in real time, seamlessly integrating AI-generated content into their projects and workflows,” the Anthropic team claims. It also announced that Claude will soon support team collaboration wherein a company can store its data, documents and projects in a single, central silo, with Claude acting as an on-demand assistant.

You can try out Claude 3.5 Sonnet today for free on the Claude.ai website and the Claude iOS app (a Claude Pro or Team subscription will garner you significantly higher rate limits). Third-party integration is also available through the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. Claude Haiku 3.5 and Opus 3.5 are scheduled for release later in the year.

Andrew Tarantola
Andrew has spent more than a decade reporting on emerging technologies ranging from robotics and machine learning to space…
The Quest 3 may undergo a significant pricing change
Alan Truly lays back and enjoys watching a movie in the Meta Quest 3 headset.

Meta Connect 2024 is tomorrow, which means the much-leaked Meta Quest 3S is about to be revealed (we hope). Despite the event being only hours away, however, there are still more leaks to be had.

The latest information, spotted by TechRadar, suggests that with the launch of the 3S, Meta will discount both Meta Quest 3 models and stop production of the 128GB model sometime in November.

Read more
Sony might have the best 480Hz OLED, but it needs to catch up with the times
An HDR video playing on the Sony InZone M9.

Do you remember the Sony InZone M9? It was briefly one of the best gaming monitors you could buy, bringing Full Array Local Dimming (FALD) to a display under $1,000. The InZone brand continued on, mostly focused on lower-end offerings. Now, Sony is back with an OLED display, and one that boasts a 480Hz refresh rate on a 1440p panel, along with features that I haven't seen on an OLED gaming monitor yet.

It's called the InZone M10S, and it's the first OLED monitor Sony has released under its InZone brand. The panel is the same one we saw on Asus ROG Swift PG27AQDP earlier this month, meaning it uses WOLED technology and it's made by LG Display. Sony is making a ton of improvements to the design, too. The monitor features an incredibly small stand that was designed alongside the esports pros at Fnatic. The base is only 159mm (6.3 inches) in diameter and 4mm (0.16 inches) thick.

Read more
Razer just opened the floodgates for its ‘cheating’ Snap Tap feature
Razer Blade 14 sitting on a coffee table.

Razer is expanding support for its Snap Tap feature, which rolled out a few months ago alongside the Huntsman V3 Pro keyboard. It allows much quicker inputs between two keys, particularly when it comes to strafing in games like Valorant, Apex Legends, and Rainbow Six: Siege. Now, the vast majority of Razer's gaming keyboards are getting support, along with Razer Blade laptops -- some of which are among the best gaming laptops you can buy.

Originally, Snap Tap was billed as a feature enabled by the Hall Effect (magnetic) switches, but this latest update proves that's not the case. Snap Tap allows you to switch between two keys without fully lifting your finger when switching between them. In the case of strafing, for example, you're able to bounce back and forth between your A and D keys, and Snap Tap will prioritize your most recent input. That's true even if your finger continues pressing down on the previous key, allowing for very fast, precise strafing.

Read more