RTX 4090M Surpasses the RTX 3090 Ti (Desktop) After Tuning
NVIDIA launched its RTX 40 Ada Lovelace mobile GPUs last month alongside the RTX 4070 Ti. You can read more on that here. Today, a user on Reddit managed to fine-tune the RTX 4090M such that it surpassed the desktop RTX 3090 Ti. That is not an easy feat to achieve. Someone must’ve won the silicon lottery, it seems.
RTX 4090M Faster Than the RTX 3090 Ti
User u/Kelzs on Reddit (r/nvidia) posted a screenshot of the RTX 4090M in 3DMark Time Spy. The particular laptop model in question is the Razer Blade 18 (RTX 4090) which costs roughly $4,499.99. Yeah, we double-checked that price. This beast features the RTX 4090M alongside the i9-13950HX which is Intel’s fastest mobile CPU to date.
The spec sheet on Razer’s website mention 2TB (1+1) of PCIe Gen 4.0 SSD. In addition, we see 32GB of fast DDR5-5600 memory and a relatively large 18″ QHD+ 240Hz, 16:10 (2560 x 1600) display. And, there you have it. The RTX 4090M scores 22339 points in 3D Mark Time Spy (Non-Extreme). This effectively puts it above the RTX 3090 Ti (21-22k) from Ampere.
For reference, techpowerup’s website claims that the RTX 4090M is actually on-par with an RTX 3080 so this is a major improvement over stock. The user has also detailed the process as follows;
I used Intel XTU to set the undervolt, disabled the integrated GPU in bios, and also disabled Intel VT or Intel VM in bios.
The GPU overclock was done with MSI Afterburner and I tested from 150 up and started getting some artifacts at 300 so I tuned it down to 250 for stability. Haven’t messed with the main clock as much.u/Kelzs
Overall, these results are indeed impressive and indicate the amount of performance one can attain from their laptop. I mean, you are paying 5 grand for a laptop, so it must be decent.
The RTX 4090M houses 76 SMs which equates to 9728 Cuda cores. This GPU ships with 16GB of GDDR6 memory. The RTX 4090M is based on NVIDIA’s AD103 GPU and has a wattage that varies from 80-150W. On paper, the FP32 compute comes out to be 38.9 TFLOPS, which is kind of achievable given the scores you saw above.
|SKU||Codename||Chip||FP32/CUDA||Max Clock||FP32 Compute||Memory||Memory Bus||TGP|
|RTX 4090||X21-X11||AD103||9728||1.45 – 2.04 GHz||38.9 TFLOPS||16GB||256-bit||80-150W|
|RTX 4080||X21-X9||AD104||7424||1.35 – 2.28 GHz||33.8 TFLOPS||12GB||192-bit||60-150W|
|RTX 4070||X21-X6||AD106||4608||1.23 – 2.175 GHz||20.0 TFLOPS||8GB||128-bit||35-115W|
|RTX 4060||X21-X4||AD107||3072||1.47 – 2.37 GHz||14.6 TFLOPS||8GB||128-bit||35-115W|
|RTX 4050||X21-X2||AD107||2560||1.61 – 2.37 GHz||12.1 TFLOPS||6GB||96-bit||35-115W|