Hardware

Nvidia Extends Support for ARM CPUs With Their Complete Stack of AI and HPC Software

Earlier today Nvidia finally announced support for processors with ARM architecture along with its entire stack of AI and HPC software. Nvidia is well acquainted with ARM as they have incorporated the architecture in their Tegra chips and other system on a chip products for portable gaming, autonomous vehicles, robotics and embedded AI computing.

Why Now?

ARM has been around for a while but it’s use in HPC systems has been non-existent until a few years. Almost all HPC systems use chips from Intel as they have been around a long time which results in great legacy software and library support.

Over the years ARM has worked tirelessly to build an ecosystem that can make their architecture a viable alternative to x86 chips. The Mont-Blanc project was a big initiative in this direction.

Mont-Blanc partners had to start from scratch building Arm HPC test systems based on 32-bit mobile phone technology and porting and tuning software and tools to create an Arm software ecosystem. In 2015, Mont-Blanc deployed the world’s first Arm-based HPC cluster, featuring over 2,000 mobile CPUs. This system helped demonstrate the viability of using Arm technology for HPC.

OAG

These initiatives are finally bearing fruits and chips with ARM architecture are increasingly being used in various HPC systems around the world.

Nvidia’s Business Interests In Data Centres

Nvidia already dominates a large part of the consumer GPU business and over the years they have created a respectable hardware and software stack for workstations. In the software side of things they have a lot solutions pertaining to AI and Deep Learning Workloads. All these workloads can be accelerated by GPUs and this is where their Tesla and Volta GPUs come in.

This has helped the company’s finance, and according to an article on Fobes authored by Karl Freund “In NVIDIA’s Q1 2019 quarter, the company once again exceeded expectations, reporting a 66% growth in total revenue, including 71% growth in its red-hot datacenter business (reaching $701M for the quarter). For NVIDIA, the “Datacenter” segment includes High-Performance Computing (HPC), datacenter-hosted graphics, and AI acceleration.” 

These are also big talking points in Nvidia’s investor keynotes. After Nvidia’s acquisition of Mellanox which we covered here, CEO Jensen Huang shared some insight behind the decision stating “The strategy is doubling down on datacenters, and we are combining and uniting two leaders in high-performance computing technologies. We are focused on accelerated computing for high performance computing, and Mellanox is focused on networking and storage for high performance computing, and we have combined the two companies under one roof. Our vision is that datacenters are the most important computers in the world today, and that in the future, as workloads continue to change – which is really triggered by artificial intelligence and data analytics – that future datacenters of all kinds will be built like high performance computers. Hyperscale datacenters were really created to provision services and lightweight computing to billions of people. But over the past several years, the emergence of artificial intelligence and machine learning and data analytics has put so much load on the datacenters, and the reason is that the data size and the compute size is so great that it doesn’t fit on one computer. So it has to be distributed on multiple computers and the high performance connectivity to allow these computers to work together is becoming more and more important. This is why Mellanox has grown so well, and why people are talking about SmartNICs and intelligent fabrics and software defined networks. All of those conversations lead to the same place, and that is a future where the datacenter is a giant compute engine that will be coherent – and it will allow for many people to still share it – but allow for few people to run very large applications on them as well. We believe that in the future of datacenters, the compute will not start and end at the server, but extend out into the network and the network itself will become part of the computing fabric. In the long term, I think we have the ability to create datacenter-scale computing architectures.

ARM Poised For Success

ARM chips power most mobile devices around the world so the architecture remains power efficient by design. Since the architecture is licensed out, with ARM multiple silicon makers can be considered.

Power consumption remains a big concern with HPCs and using ARM can offset this problem to a huge extent. Even with software, with the Mont-Blanc projects a lot of scientific libraries and tools have been developed for ARM, this plays a big part in taking the entire ecosystem forward.

ARM’s use in HPCs and Data Centres is still small compared to x86 systems but Nvidia sees the potential here. Their arch-rival AMD has also started to compete fiercely in the HPC and Data centre market with their EPYC server processors and Radeon Instinct GPU accelerators. So it’s important for Nvidia to adopt ARM now and offer their software suite (CUDA-X HPC, ect). Unlike some manufacturers, Nvidia doesn’t make CPUs, so they lack the CPU-GPU coherency AMD and Intel can offer.

On the hindsight Nvidia can bolster up a partnership with ARM, as NextPlatform rightly state “Nvidia and Arm could strike up a partnership to make NVLink IP blocks available to those who buy Neoverse licenses, allowing for more tight coupling with GPUs, including memory atomics and memory coherency across the CPU-GPU compute complexes.

This move will definitely help ARM’s case as a viable architecture alternative to x86 HPCs. We can expect a similar move from AMD sometime in the future as they continue to aggressively push their Radeon Instinct GPUs.


Leave a Reply

Your email address will not be published.

Close