AMD MI1000 Instinct Accelerator 7nm GPU For HPC Officially Launched Offering 11.5 Teraflops Of Peak Double-Precision Floating Point Performance

AMD’s first-ever GPU based on the CDNA Architecture, the MI100 is official. The MI100 Instinct Accelerator GPU is being claimed as “the world’s fastest HPC GPU” with 11.5 teraflops of peak double-precision floating-point performance. The GPU claims to pack twice as many compute units as the previous generation while staying within the same 300-watt power limit.

AMD today announced the new MI100 Instinct accelerator. The GPU is based on the CDNA engine which differs slightly from RDNA Architecture that powers the latest AMD Radeon RX 6000 Series of Graphics Cards. The MI100 Instinct Accelerator succeeds the MI50 and MI60 Instinct accelerators launched two years ago. Despite the relatively small period of time between generations, the new GPU Architecture and Compute Engine allow the AMD GPU to exceed expectations.

AMD MI1000 Instinct Accelerator 7nm GPU For HPC Industry Specifications and Features:

The MI100 GPU is the first to incorporate AMD’s Compute DNA (CDNA) architecture. The GPU has 210 Compute Units arranged in four arrays. The CDNA Architecture is a significant evolutionary leap compared to the GCN architecture and it includes new matrix core engines that boost computational throughput for different numerical formats.

AMD claims the new AMD matrix core technology provides the MI100 with a 7x greater peak half-precision floating-point performance compared to the MI50. The company claims the MI100 Instinct Accelerator offers 46.1 teraflops peak single-precision matrix (FP32), 23.1 teraflops peak single-precision (FP32), 184.6 teraflops peak half-precision (FP16) floating-point performance, and 92.3 peak teraflops of bfloat16 performance.

The MI100 also gets AMD’s Infinity Fabric Technology which is claimed to offer about 2x the peer-to-peer peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card. In real-life deployments, the MI100 GPUs can be configured with up to two integrated quad GPU hives, each providing up to 552 Gbps of peer-to-peer I/O bandwidth.

Similarly, four stacks of 8GB HBM2 memory provide a total of 32GB HBM2 memory on each MI100 GPU. With a 1.2 GHz Clock Speed, the memory offers 1.23 Tbps of memory bandwidth. The MI100’s support for PCIe Gen 4.0 technology enables 64 Gbps peak theoretical transport data bandwidth between CPU and GPU.

Is AMD MI100 Accelerator GPU Better Than NVIDIA A100 GPU?

Strictly on paper, the AMD’s MI100 GPU appears better than the NVIDIA A100 GPU which is rated at 9.7 teraflops of peak theoretical performance. However, in reality, the NIVIDIA A100 offers higher performance FP64 Linpack runs.

AMD’s CDNA and RDNA Architecture are essentially the same with the major difference being the end-user scenarios. There are a few fundamental differences though which don’t allow the CDNA Architecture to be used for gaming or visual content rendering.

Incidentally, AMD is preparing ROCm which is the company’s open-source toolset consisting of compilers, programming APIs, and libraries. This toolset will serve as the basis for exascale computing workloads. The latest ROCm 4.0 has upgraded the compiler to be open source and unified to support both OpenMP 5.0 and HIP. Simply put, the competition between AMD and NVIDIA in the HPC segment extends beyond simple capabilities and raw processing power.

Alap Naik Desai


A B.Tech Plastics (UDCT) and a Windows enthusiast. Optimizing the OS, exploring software, searching and deploying solutions to strange and weird issues is Alap's main interest.