Meet the Biren BR100, China’s Fastest GPU That is Nearly 3x Faster Than NVIDIA’s A100

China is keen on entering the semiconductor market and this year they have shown remarkable results. Hot Chips 34 seems to be the talk of the day with NVIDIA unveiling their upcoming Hopper GPUs there. You can read more about this here. Birentech from China took this opportunity to showcase their upcoming BR100 GPU which as per reports is faster than NVIDIA’s Ampere based A100.

Specifications of the BR100

This GPU is based on the 7nm process node featuring 77 billion transistors (Just 3 billion shy of NVIDIA’s H100). TSMC’s 2.5D CoWoS design has been used for this process node. As for the memory, this monstrosity is powered by 64GB of HBM2e having a bandwidth of around 2.3TB/s. The chip size comes out to be around 1074mm².

Hopper H100Biren BR100
PCIe Gen 5.0PCIe Gen 5.0
HBM3 MemoryHBM2e Memory
Memory Bandwidth of 2.3TB/sMemory Bandwidth of 3TB/s
TSMC 4nTSMC’s 2.5D CoWoS for a 7nm process
80GB Memory64GB Memory
NVLink (Die-to-Die) 900GB/s Die-to-Die 896GB/s
Monolithic DesignMCM (Multi-Chip-Module) Design
700W550W

 

Specifications for the Biren BR100 | Birentech by Wccftech

An Architectural Overview

As stated above, the GPU features an MCM design consisting of 2 chiplets where each chiplet is powered by 16 SPC (Streaming Processing Clusters). Every SPC consists of 16 EUs (Execution Units) and 4 EUs form a Compute Unit (CU).

  • Chiplets : 2
  • SPCs : 2×16 = 32
  • EUs = 32×16 = 512
  • CUs = 512/4 = 128

Inside the SPC, we can find 16 EUs. A more detailed insight shows that each EU consists of 16 streaming processing cores (V-core) and a T-core or a Tensor core. The x16 streaming processing cores (Or 1 V-Core) power FP32, FP16, INT32, INT16 computations. 

A look inside BR100’s EUs | Birentech by Wccftech

BR100 vs A100

In comparison to last-gen’s Ampere based A100, the BR100 is around 2.6x faster in select benchmarks. This puts to show how quick China is accelerating in the GPU department. However, sorry for being a killjoy but the Hopper based H100 is around 2-3x faster in the same benchmarks. Those Tensor cores can boost this lead to around 30x in various tests.

NVIDIA’s A100 vs Birentech’s B100 | Birentech by Wccftech

General Use

The GPU is meant for China’s AI department and is said to mimic human behavior with its enhance AI performance. This is so that China can rely on its own technology. 

 

Featured Image Credit : ferdibtk at Freepik

Abdullah Faisal
With a love for computers since the age of give, Abdullah has always sought to delve into the depths of information, and uses it as his guiding light. He believes success is of utmost importance as history is written by the victor.