NVIDIA

NVIDIA Announces DGX GH200, One Giant GPU that doubles as an AI Supercomputer

May 30, 2023 • 3 min read

In a ground-breaking announcement at COMPUTEX 2023, NVIDIA CEO Jensen Huang, unveiled the DGX GH200 – the world’s first 100 terabyte GPU memory system. Powered by NVIDIA Grace Hopper Superchips and an advanced NVLink Switch System, this generational leap marks a pivotal moment for AI and high-performance computing (HPC) applications, especially those demanding extensive memory capacities.

At the heart of the DGX GH200 lies a constellation of 256 Grace Hopper Superchips. Each chip is loaded with 480 GB LPDDR5 CPU memory and 96 GB of fast HBM3. Now, this might sound like a mouthful of tech jargon, but it essentially means these chips are very power efficient, using just an eighth of the power per GB compared to DDR5. To put this in perspective, this is akin to a supercar running on the fuel efficiency of a hybrid.

Massive memory supercomputing for emerging AI.

These superchips are woven together with the NVLink-C2C, which, if you haven't been following NVIDIA's tech developments, is a marvel of its own. Imagine a super-highway with seven lanes instead of one—that's NVLink-C2C compared to PCIe Gen5, and it does this while using just one-fifth of the power.

The DGX GH200 effectively transforms 256 Grace Hopper Superchips into a single, monolithic, data-center-sized GPU.

This ingenious configuration results in a shared memory programming model that accesses an awe-inspiring 144 terabytes of memory at high speed over NVLink. This is almost 500 times more memory capacity than a single NVIDIA DGX A100 320 GB system, positioning the DGX GH200 as the first supercomputer to break the 100-terabyte barrier for memory accessible to GPUs over NVLink.

The unprecedented computational power and massive memory leap of the DGX GH200 significantly improves the performance of AI and HPC applications bottlenecked by GPU memory size. Whether it's deep learning recommendation models, graph neural networks, or large data analytics, the DGX GH200 has the muscle to take them all on.

Performance comparisons for giant memory AI workloads

And it's not just muscle—it's fast muscle. Preliminary tests have shown this beast can accelerate the processing of AI and HPC models requiring massive memory by four to seven times. But NVIDIA didn't stop at building a powerful machine. The DGX GH200 comes bundled with NVIDIA Base Command—an optimized AI workload OS, cluster manager, and libraries that speed up compute, storage, and network infrastructure. To top it off, NVIDIA AI Enterprise is part of the package, providing a suite of software and frameworks that streamline AI development and deployment.

The NVIDIA DGX GH200 AI supercomputer full stack includes NVIDIA Base Command and NVIDIA AI Enterprise

Tech giants like Google, Meta, and Microsoft are already lined up to take advantage of the DGX GH200, foreshadowing its potential to reshape the industry landscape. NVIDIA itself is also developing its own AI supercomputer, NVIDIA Helios, which it says will leverage four DGX GH200 systems. That's right, four.

Over the past week, NVIDIA's valuation has surged to $1 trillion, driven by higher-than-expected earnings and a forecast for an upcoming record-breaking quarter. This spike largely stems from an unprecedented demand for its GPUs, which are extensively used to power a multitude of AI applications. The release of the DGX GH200 is poised to fuel this demand even further.

As the curtain falls on COMPUTEX 2023, the tech world is abuzz with the possibilities opened by NVIDIA's DGX GH200. As Jensen Huang put it, "This is going to revolutionize an entire industry." A bold claim from a bold CEO, but one that's backed by the impressive specs and initial interest in the product. NVIDIA expects the DGX GH200 to be available by the end of this year.