NVIDIA Ampere A100 250W TDP GPU On PCIe 4.0 Made For AI, Data Science, And Supercomputing Launched With Promised 90 Percent Performance Of 400W Model

NVIDIA has officially launched the A100, a PCIe 4.0 compatible GPU based on the next-gen Ampere architecture. Although featuring a lower 250W TDP profile, NVIDIA promises the PCIe 4.0 Ampere A100 GPU will be able to offer up to 90 percent of the performance of the full 400W A100 HGX GPU. The third variant to its growing Ampere A100 GPU family, the A100 PCIe is meant for servers running Artificial Intelligence (AI), Data Science, and Supercomputing clusters.

NVIDIA revealed a PCI-Express 4.0 variant of the A100 GPU. The GPU is based on the 7nm Ampere microarchitecture. Additionally, the company also announced several A100 powered systems from leading server manufacturers, including Asus, Dell, Cisco, Lenovo, and more. The 250W A100 PCIe 4.0 GPU accelerator is quite similar to the full 400W TDP variant, and NVIDIA is also promising near identical performance despite the significant drop in the TDP profile.

NVIDIA A100 Ampere GPU In PCIe 4.0 Form-Factor With Same 400W A100 HGX GPU Configuration But At 250W:

NVIDIA has announced its PCIe 4.0 A100 PCIe GPU accelerator. The Ampere GPU is available for a diverse set of industrial use cases with systems ranging from a single A100 PCIe GPU to servers utilizing two cards at the same time through the 12 NVLINK channels that deliver a total of 600 GB/s of interconnect bandwidth. The 250W TDP A100 PCIe GPU accelerator doesn’t change much in terms of core configuration when compared to the 400W A100 HGX GPU.

The GA100 GPU has the specifications of the 400W A100 HGX variant with 6912 CUDA cores arranged in 108 SM units, 432 Tensor Cores, and 40 GB of HBM2 memory that delivers the same memory bandwidth of 1.55 TB/s (rounded off to 1.6 TB/s). However, deploying the GPU package on the PCIe 4.0 standard had its own drawback of significantly reduce TDP. This reportedly means a 10 to 50 percent performance penalty based on the workload. Moreover, the 250W TDP variant of the A100 GPU is more suited to short bursts rather than sustained loads.

NVIDIA A100 Ampere GPU In PCIe 4.0 Form-Factor Performance:

Owing to the significant reduction in TDP profile, it could be assumed that the card would feature lower clocks to compensate for the less TDP input. However, the performance metrics that NVIDIA has released are truly surprising as they come quite close to the 400W TDP variant.  The FP64 performance is still rated at 9.7/19.5 TFLOPs, FP32 performance is rated at 19.5 /156/312 TFLOPs (Sparsity), FP16 performance is rated at 312/624 TFLOPs (Sparsity), and INT8 is rated at 624/1248 TOPs (Sparsity).

Simple math indicates, and NVIDIA assures, the Ampere-based PCIe 4.0 250W A100 GPU can deliver 90 percent of the performance of the A100 HGX card (400W) in top server applications. This is justifiable because it takes lesser time for the new variant to complete the abovementioned tasks. However, the numbers should be valid for short intervals only. In complex, situations that required sustained GPU capabilities, the 250W PCIe 4.0 GPU can deliver anywhere from up to 90 percent to down to 50 percent the performance of the 400W A100 HGX GPU.

The Ampere microarchitecture will surely benefit the new A100. NVIDIA promises at least a 20X performance boost over the Volta-based predecessor. The PCIe 4.0 A100 GPU features multi-instance GPU tech. This means a single A100 can be partitioned into as many as seven separate GPUs to handle different computing tasks. While this boosts segmentation, there’s 3rd-gen NVLink, which enables several GPUs to be joined into one giant GPU.


Alap Naik Desai

A B.Tech Plastics (UDCT) and a Windows enthusiast. Optimizing the OS, exploring software, searching and deploying solutions to strange and weird issues is Alap's main interest.