Meet the Biren BR100, China’s Fastest GPU That is Nearly 3x Faster Than NVIDIA’s A100

China is keen on entering the semiconductor market and this year they have shown remarkable results. Hot Chips 34 seems to be the talk of the day with NVIDIA unveiling their upcoming Hopper GPUs there. You can read more about this here. Birentech from China took this opportunity to showcase their upcoming BR100 GPU which as per reports is faster than NVIDIA’s Ampere based A100.
Specifications of the BR100
This GPU is based on the 7nm process node featuring 77 billion transistors (Just 3 billion shy of NVIDIA’s H100). TSMC’s 2.5D CoWoS design has been used for this process node. As for the memory, this monstrosity is powered by 64GB of HBM2e having a bandwidth of around 2.3TB/s. The chip size comes out to be around 1074mm².
Hopper H100 | Biren BR100 |
PCIe Gen 5.0 | PCIe Gen 5.0 |
HBM3 Memory | HBM2e Memory |
Memory Bandwidth of 2.3TB/s | Memory Bandwidth of 3TB/s |
TSMC 4n | TSMC’s 2.5D CoWoS for a 7nm process |
80GB Memory | 64GB Memory |
NVLink (Die-to-Die) 900GB/s | Die-to-Die 896GB/s |
Monolithic Design | MCM (Multi-Chip-Module) Design |
700W | 550W |

An Architectural Overview
As stated above, the GPU features an MCM design consisting of 2 chiplets where each chiplet is powered by 16 SPC (Streaming Processing Clusters). Every SPC consists of 16 EUs (Execution Units) and 4 EUs form a Compute Unit (CU).
- Chiplets : 2
- SPCs : 2×16 = 32
- EUs = 32×16 = 512
- CUs = 512/4 = 128
Inside the SPC, we can find 16 EUs. A more detailed insight shows that each EU consists of 16 streaming processing cores (V-core) and a T-core or a Tensor core. The x16 streaming processing cores (Or 1 V-Core) power FP32, FP16, INT32, INT16 computations.

BR100 vs A100
In comparison to last-gen’s Ampere based A100, the BR100 is around 2.6x faster in select benchmarks. This puts to show how quick China is accelerating in the GPU department. However, sorry for being a killjoy but the Hopper based H100 is around 2-3x faster in the same benchmarks. Those Tensor cores can boost this lead to around 30x in various tests.

General Use
The GPU is meant for China’s AI department and is said to mimic human behavior with its enhance AI performance. This is so that China can rely on its own technology.