The Grace CPU and related Superchip designs were initially introduced by NVIDIA at GTC 2022. The Grace CPU is NVIDIA’s first processor built on a unique ARM architecture and is targeted at the server and high-performance computing markets. Two different Superchip configurations are available for the CPU: a Grace Superchip module with two Grace CPUs and a Grace+Hopper Superchip with one Grace CPU coupled to a Hopper H100 GPU.
Grace, NVIDIA’s first server CPU, has 72 ARM v9.0 cores and supports SVE2, as well as several virtualization extensions including Nested Virtualization and S-EL2. The 4N process node from TSMC, an improved 5nm manufacturing node designed specifically for NVIDIA, is used to manufacture the CPU. Peak FP64 performance on the new architecture can reach up to 7.1 TFLOPs.
Grace’s C2C (Chip-To-Chip) interface is one of the most important features of the design since it is intended to be used in pairs. Grace does this by eliminating all bottlenecks connected to a traditional cross-socket arrangement using NVLINK, which is utilized to create the Superchips. The C2C NVLINK connection operates at a very low power interface of only 1.3 pJ/bit or 5 times more efficiently than the PCIe protocol, and it offers 900 GB/s of raw bi-directional bandwidth. Here is a summarized table for NVIDIA’s Grace Superchip specifications:
|NVIDIA Grace CPU Superchip architecture features|
|Core architecture||Neoverse V2 Cores: Armv9 with 4x128b SVE2|
|Cache||L1: 64 KB I-cache + 64 KB D-cache per core L2: 1 MB per core L3: 234 MB per superchip|
|Memory technology||LPDDR5X with ECC, co-packaged|
|Raw memory BW||Up to 1 TB/s|
|Memory size||Up to 960 GB|
|FP64 peak||7.1 TFLOPS|
|PCI express||8x PCIe Gen 5 x16 interfaces; option to bifurcate Total 1 TB/s PCIe bandwidth. Additional low-speed PCIe connectivity for management.|
|Power||500 W TDP with memory, 12 V supply|
The Grace CPU Superchip competes against dual-socket (2P) AMD EPYC 7763 “Milan” CPUs in terms of performance on a variety of HPC applications, including OpenFOAM, WRF, NEMO, and BWA. The Grace CPU Superchip provides an astounding 2.5x speed gain with up to 3.5x efficiency in OpenFOAM. When compared to AMD’s EPYC Milan CPUs, NVIDIA’s new Grace CPU Superchip should be able to achieve performance gains of 1.9x and 2.57x per watt, respectively. In comparison to the most recent server CPUs from AMD and Intel, this should also result in comparable performance.
According to NVIDIA, the Grace processor is highly specialized and is intended for tasks like training next-generation NLP models with more than a trillion parameters. A Grace CPU-based system will operate 10 times quicker than the most advanced NVIDIA DGX-based systems available today when firmly paired with NVIDIA GPUs. Although by the time they are released, they will be competing with AMD’s Genoa and Intel’s Sapphire Rapids CPUs, it will be fascinating to see how the Grace CPUs perform versus x86 processors.