Skip to main content

Nvidia’s new Tesla cards meet the needs of the growing capacities of AI services

nvidia tesla p40 p4 deep neural network inferencing production accelerator
Image used with permission by copyright holder
Now that Nvidia has addressed the consumer market with its latest graphics cards based on the “Pascal” architecture, the next solutions in the company’s Pascal rollout addresses the deep neural network market to accelerate machine learning. These solutions arrive in the form of Nvidia’s new Tesla P4 and Tesla P40 accelerator cards to speed up the inferencing production workloads carried out by services that use artificial intelligence.

There are essentially two types of accelerator cards for deep neural networks: training and inference. The former should speak for itself, accelerating the training of a deep neural network before it’s deployed in the field. Inference, however, is the process of providing an input to the deep neural network and having it extract data based on that input. That includes translating speech in real-time and localizing faces in images.

Recommended Videos

According to Nvidia, the new Tesla P4 and Tesla P40 accelerator cards are designed for inferencing and include specialized inference instructions based on 8-bit operations, making them 45 times faster in response time than an Intel Xeon E5-2690v4 processor. They also provide a 4x improvement over the company’s previous generation of “Maxwell” Tesla cards, the M40 and M4.

Get your weekly teardown of the tech behind PC gaming
Check your inbox!

The company said this week during its GTC Beijing 2016 conference that the Tesla P4 sports a small form-factor that’s ideal for data centers. It’s 40x more energy efficient than CPUs that are used for inferencing, and a single Tesla P4 server can replace 13 CPU-only servers built for video inferencing workloads. Meanwhile, the Tesla P40 is ideal for deep learning workloads, with a server containing eight of these accelerators able to replace more than 140 CPU-based servers.

Compared to the previous Tesla M40, the new P40 packs more CUDA cores, higher clock speeds, a faster memory clock, a higher single precision of 12 TFLOPS, and a higher number of transistors at 12 billion. However, the power requirement (thermal envelope) stays the same, thus Nvidia has managed to boost the performance-per-watt level without forcing the card to require more power. The same holds true with the slower Tesla P4 model too when compared to the older Tesla M4 card.

“With the Tesla P100 and now Tesla P4 and P40, NVIDIA offers the only end-to-end deep learning platform for the data center, unlocking the enormous power of AI for a broad range of industries,” said Ian Buck, general manager of accelerated computing at Nvidia. “They slash training time from days to hours. They enable insight to be extracted instantly. And they produce real-time responses for consumers from AI-powered services.”

Nvidia revealed the Tesla P100 during its local GTC 2016 conference five months ago. This card is ideal for accelerating neural network training, delivering a performance increase of more than 12 times compared to the previous generation Maxwell-based solution. Again, neural networks need to be trained first before they’re deployed into the field, and the new Tesla card speeds up the process, cutting AI training down from weeks to days.

In addition to the two new Tesla cards, Nvidia also launched TensorRT, a library for “optimizing deep learning models for production deployment.” The company also introduced the Nvidia DeepStream SDK for simultaneously decoding and analyzing up to 93 HD video streams. However, here’s a brief list of hardware details for Nvidia’s two new Tesla cards that are now avaialble:

Tesla P40 Tesla P4
GPU GP102 GP104
CUDA Cores 3,840 2,560
Base Clock 1,303MHz 810MHz
Boost Clock 1,531MHz 1,063MHz
GDDR5 Memory Clock 7.2Gbps 6Gbps
Memory Bus Width 384-bit 256-bit
GDDR5 Amount 24GB 8GB
Single Precision 12 TFLOPS 5.5 TFLOPS
TDP 250 watts 50 to 75 watts
Kevin Parrish
Former Digital Trends Contributor
Kevin started taking PCs apart in the 90s when Quake was on the way and his PC lacked the required components. Since then…
The best VR headsets for 2024
Fionna Ahomuoh using the Meta Quest 3 VR headset.

Virtual reality is finally crossing a threshold when everyone should be taking a closer look. As the number of VR headsets increases, getting the best one is important so you can truly appreciate what's possible. The challenge is finding the system that's right for you at a price you feel comfortable with.

Meta, HTC Vive, Sony, and Pimax stand out as the most popular and most active virtual reality brands. There's little doubt the $3,500 Apple Vision Pro is an impressive mixed-reality headset. However, there are plenty of other XR and VR headsets that are much more affordable than the Vision Pro and deliver a great, immersive experience for gaming, 3D movies, and even productivity. It's a good idea to check out all the options, and we've collected the very best here to make it easy to find the perfect VR headset for you.

Read more
How to know which Mac to buy — and when to buy it
The M4 Mac mini being used in a workplace.

If you’re in the market for a new Mac (or Apple display), there’s a lot of choice ahead of you. Maybe you're interested in a lightweight MacBook Air from the selection of the best MacBooks -- or maybe one of the desktop Macs. Either way, there’s a wide variety of Apple products on offer, including some external desktop monitors.

Below you'll find the latest information on each model, including if it's a good time to buy and when the next one up is coming.
MacBook Pro

Read more
AMD Ryzen AI claimed to offer ‘up to 75% faster gaming’ than Intel
A render of the new Ryzen AI 300 chip on a gradient background.

AMD has just unveiled some internal benchmarks of its Ryzen AI 9 HX 370 processor. Although it's been a few months since the release of the Ryzen AI 300 series, AMD now compares its CPU to Intel's Lunar Lake, and the benchmarks are highly favorable for AMD's best processor for thin-and-light laptops. Let's check them out.

For starters, AMD compared the Ryzen AI 9 HX 370 to the Intel Core Ultra 7 258V. The AMD CPU comes with 12 cores (four Zen 5 and eight Zen 5c cores) and 24 threads, as well as 36MB of combined cache. The maximum clock speed tops out at 5.1GHz, and the CPU offers a configurable thermal design power (TDP) ranging from 15 watts to 54W. Meanwhile, the Intel chip sports eight cores (four performance cores and four efficiency cores), eight threads, a max frequency of 4.8GHz, 12MB of cache, and a TDP ranging from 17W to 37W. Both come with a neural processing unit (NPU), and AMD scores a win here too, as its NPU provides 50 trillion operations per second (TOPS), while Intel's sits at 47 TOPS. It's a small difference, though.

Read more