NVIDIA’s Grace Superchip Official Benchmarks Shows Superior Performance from AMD’s EPYC Lineup

NVIDIA has published a detailed analysis of their next-generation Grace CPU Superchip, which would outperform AMD EPYC CPUs by up to 2.5 times. 

The Grace CPU and related Superchip designs were initially introduced by NVIDIA at GTC 2022. The Grace CPU is NVIDIA’s first processor built on a unique ARM architecture and is targeted at the server and high-performance computing markets. Two different Superchip configurations are available for the CPU: a Grace Superchip module with two Grace CPUs and a Grace+Hopper Superchip with one Grace CPU coupled to a Hopper H100 GPU. 

Grace, NVIDIA’s first server CPU, has 72 ARM v9.0 cores and supports SVE2, as well as several virtualization extensions including Nested Virtualization and S-EL2. The 4N process node from TSMC, an improved 5nm manufacturing node designed specifically for NVIDIA, is used to manufacture the CPU. Peak FP64 performance on the new architecture can reach up to 7.1 TFLOPs.

Grace Chip’s Internal Composition | Image: NVIDIA

Grace’s C2C (Chip-To-Chip) interface is one of the most important features of the design since it is intended to be used in pairs. Grace does this by eliminating all bottlenecks connected to a traditional cross-socket arrangement using NVLINK, which is utilized to create the Superchips. The C2C NVLINK connection operates at a very low power interface of only 1.3 pJ/bit or 5 times more efficiently than the PCIe protocol, and it offers 900 GB/s of raw bi-directional bandwidth. Here is a summarized table for NVIDIA’s Grace Superchip specifications:

NVIDIA Grace CPU Superchip architecture features 
Core architectureNeoverse V2 Cores: Armv9 with 4x128b SVE2 
Core count144 
CacheL1: 64 KB I-cache + 64 KB D-cache per core L2: 1 MB per core L3: 234 MB per superchip 
Memory technologyLPDDR5X with ECC, co-packaged  
Raw memory BWUp to 1 TB/s 
Memory sizeUp to 960 GB 
FP64 peak7.1 TFLOPS 
PCI express8x PCIe Gen 5 x16 interfaces; option to bifurcate  Total 1 TB/s PCIe bandwidth. Additional low-speed PCIe connectivity for management. 
Power500 W TDP with memory, 12 V supply 

The Grace CPU Superchip competes against dual-socket (2P) AMD EPYC 7763Milan” CPUs in terms of performance on a variety of HPC applications, including OpenFOAM, WRF, NEMO, and BWA. The Grace CPU Superchip provides an astounding 2.5x speed gain with up to 3.5x efficiency in OpenFOAM. When compared to AMD’s EPYC Milan CPUs, NVIDIA’s new Grace CPU Superchip should be able to achieve performance gains of 1.9x and 2.57x per watt, respectively. In comparison to the most recent server CPUs from AMD and Intel, this should also result in comparable performance. 

NVIDIA’s Grace Chip Comparison | Image: NVIDIA

According to NVIDIA, the Grace processor is highly specialized and is intended for tasks like training next-generation NLP models with more than a trillion parameters. A Grace CPU-based system will operate 10 times quicker than the most advanced NVIDIA DGX-based systems available today when firmly paired with NVIDIA GPUs. Although by the time they are released, they will be competing with AMD’s Genoa and Intel’s Sapphire Rapids CPUs, it will be fascinating to see how the Grace CPUs perform versus x86 processors.


Muhammad Zuhair

Passionate about technology and gaming content, Zuhair focuses on analysing information and then presenting it to the audience.
Back to top button