AMD’s first-ever GPU based on the CDNA Architecture, the MI100 is official. The MI100 Instinct Accelerator GPU is being claimed as “the world’s fastest HPC GPU” with 11.5 teraflops of peak double-precision floating-point performance. The GPU claims to pack twice as many compute units as the previous generation while staying within the same 300-watt power limit.
AMD today announced the new MI100 Instinct accelerator. The GPU is based on the CDNA engine which differs slightly from RDNA Architecture that powers the latest AMD Radeon RX 6000 Series of Graphics Cards. The MI100 Instinct Accelerator succeeds the MI50 and MI60 Instinct accelerators launched two years ago. Despite the relatively small period of time between generations, the new GPU Architecture and Compute Engine allow the AMD GPU to exceed expectations.
— AMD Instinct (@AMDInstinct) November 16, 2020
AMD MI1000 Instinct Accelerator 7nm GPU For HPC Industry Specifications and Features:
The MI100 GPU is the first to incorporate AMD’s Compute DNA (CDNA) architecture. The GPU has 210 Compute Units arranged in four arrays. The CDNA Architecture is a significant evolutionary leap compared to the GCN architecture and it includes new matrix core engines that boost computational throughput for different numerical formats.
AMD Mi100 is here. And most notably the performance is not the key factor, the software ecosystem growth is whats important to look at. As AMD & Cray begin deploy Supercomputing projects throughout this year, you will see many of AMD's ROCm tools get adoption elsewhere too. pic.twitter.com/w5eeYqG680
— Cyber Cat (@0xCats) November 16, 2020
AMD claims the new AMD matrix core technology provides the MI100 with a 7x greater peak half-precision floating-point performance compared to the MI50. The company claims the MI100 Instinct Accelerator offers 46.1 teraflops peak single-precision matrix (FP32), 23.1 teraflops peak single-precision (FP32), 184.6 teraflops peak half-precision (FP16) floating-point performance, and 92.3 peak teraflops of bfloat16 performance.
For the peeps following me for AMD Mi100/CDNA news, here is the architecture whitepaper link. The Matrix unit some are asking about is just a new way to handle matrix.matrix multiply and fused multiply add (MFMA) efficiently in less CU cycles.https://t.co/5vZ5gREec5 pic.twitter.com/AUsHnyF1Uw
— Cyber Cat (@0xCats) November 16, 2020
The MI100 also gets AMD’s Infinity Fabric Technology which is claimed to offer about 2x the peer-to-peer peak I/O bandwidth over PCIe 4.0 with up to 340 GB/s of aggregate bandwidth per card. In real-life deployments, the MI100 GPUs can be configured with up to two integrated quad GPU hives, each providing up to 552 Gbps of peer-to-peer I/O bandwidth.
Similarly, four stacks of 8GB HBM2 memory provide a total of 32GB HBM2 memory on each MI100 GPU. With a 1.2 GHz Clock Speed, the memory offers 1.23 Tbps of memory bandwidth. The MI100’s support for PCIe Gen 4.0 technology enables 64 Gbps peak theoretical transport data bandwidth between CPU and GPU.
— HotHardware (@HotHardware) November 16, 2020
Is AMD MI100 Accelerator GPU Better Than NVIDIA A100 GPU?
Strictly on paper, the AMD’s MI100 GPU appears better than the NVIDIA A100 GPU which is rated at 9.7 teraflops of peak theoretical performance. However, in reality, the NIVIDIA A100 offers higher performance FP64 Linpack runs.
"Each Instinct MI100 card has 64 GB/sec of PCI-Express 4.0 bandwidth and 276 GB/sec of Infinity Fabric bandwidth across its three pipes, for a total of 340 GB/sec of I/O bandwidth" https://t.co/6Kq5gdYaYB
— Janet Morss (@jamonascone) November 16, 2020
AMD’s CDNA and RDNA Architecture are essentially the same with the major difference being the end-user scenarios. There are a few fundamental differences though which don’t allow the CDNA Architecture to be used for gaming or visual content rendering.
Incidentally, AMD is preparing ROCm which is the company’s open-source toolset consisting of compilers, programming APIs, and libraries. This toolset will serve as the basis for exascale computing workloads. The latest ROCm 4.0 has upgraded the compiler to be open source and unified to support both OpenMP 5.0 and HIP. Simply put, the competition between AMD and NVIDIA in the HPC segment extends beyond simple capabilities and raw processing power.