Skip to main content

The ‘most powerful AI training system in the world’ just went online

Elon Musk talks to the press as he arrives to to have a look at the construction site of the new Tesla Gigafactory near Berlin.
Maja Hitij/Getty Images / Getty Images

The race for AI supremacy is once again accelerating as xAI CEO Elon Musk announced via Twitter that his company successfully brought its Colossus AI training cluster, which Musk bills as the world’s “most powerful,” online over the weekend.

This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days.

Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months.

Excellent…

— Elon Musk (@elonmusk) September 2, 2024

“This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent work by the team, Nvidia and our many partners/suppliers,” Musk wrote in a post on X.

Recommended Videos

Musk’s “most powerful” claim is based on the number of GPUs employed by the system. With 100,000 Nvidia H100s driving it, Colossus is estimated to be larger than any other AI system developed to date.

Musk began purchasing tens of thousands of GPUs in April 2023 to accelerate his company’s AI efforts, shortly after penning an open letter calling for an industrywide, six month “pause” on AI development.  In March of that year, Musk claimed that the company would leverage AI to “detect & highlight manipulation of public opinion” on Twitter, though the GPU supercomputer will likely also be leveraged to train its large language model (LLM), Grok.

Grok was introduced by xAI in 2023 in response to the success of rivals like ChatGPT, Gemini, Llama 3.1, and Claude. The company released the updated Grok-2 as a beta in August. “We have introduced Grok-2, positioning us at the forefront of AI development,” xAI wrote in a recent blog post. “Our focus is on advancing core reasoning capabilities with our new compute cluster. We will have many more developments to share in the coming months.”

Musk claims that he can also develop Tesla into “a leader in AI & robotics,” however, a recent report from CNBC suggests that Musk has been diverting shipments of Nvidia’s highly sought-after GPUs from the electric automaker to xAI and Twitter. Doing so could delay Tesla’s efforts to install the compute resources needed to develop its autonomous vehicle technology and the Optimus humanoid robot.

“Elon prioritizing X H100 GPU cluster deployment at X versus Tesla by redirecting 12k of shipped H100 GPUs originally slated for Tesla to X instead,” an Nvidia memo from December obtained by CNBC reads. “In exchange, original X orders of 12k H100 slated for [January] and June to be redirected to Tesla.”

Andrew Tarantola
Andrew Tarantola is a journalist with more than a decade reporting on emerging technologies ranging from robotics and machine…
Meta’s next AI model to require nearly 10 times the power to train
mark zuckerberg speaking

Facebook parent company Meta will continue to invest heavily in its artificial intelligence research efforts, despite expecting the nascent technology to require years of work before becoming profitable, company executives explained on the company's Q2 earnings call Wednesday.

Meta is "planning for the compute clusters and data we'll need for the next several years," CEO Mark Zuckerberg said on the call. Meta will need an "amount of compute… almost 10 times more than what we used to train Llama 3," he said, adding that Llama 4 will "be the most advanced [model] in the industry next year." For reference, the Llama 3 model was trained on a cluster of 16,384 Nvidia H100 80GB GPUs.

Read more
We just learned something surprising about how Apple Intelligence was trained
Apple Intelligence update on iPhone 15 Pro Max.

A new research paper from Apple reveals that the company relied on Google's Tensor Processing Units (TPUs), rather than Nvidia's more widely deployed GPUs, in training two crucial systems within its upcoming Apple Intelligence service. The paper notes that Apple used 2,048 Google TPUv5p chips to train its AI models and 8,192 TPUv4 processors for its server AI models.

Nvidia's chips are highly sought for good reason, having earned their reputation for performance and compute efficiency. Their products and systems are typically sold as standalone offerings, enabling customers to construct and operate them as the best see fit.

Read more
AI just came to VR in a big way
Alan Truly wears a Meta Quest 3 while laying down with a Meta AI prompt overlaid.

Meta just announced a huge update to its AI and it's coming soon to the Meta Quest platform, home to some of the best VR headsets you can buy. In the next few weeks, you'll soon have access to a smart voice assistant anytime you wear your Quest 3, Quest Pro, or Quest 2.

You can see some examples of how Meta AI works on the Quest 3 in the video below. Note that Meta AI can use the Quest 3's and Quest Pro's mixed reality mode to see real-world objects and answer your questions. The Quest 2's black-and-white passthrough camera isn't supported for visual input.

Read more