Nvidia hopes to take graphics processing units (GPUs) in the datacentre to the next level by addressing what it sees as a bottleneck limiting data processing in traditional architectures.
In general, the central processing unit (CPU) in a datacentre server would pass on certain data processing calculations to a GPU, which is optimised to run such workloads.
But, according to Nvidia, memory bandwidth limits the level of optimisation. A GPU will usually be configured with a relatively smaller amount of fast memory compared with the CPU, which has a larger amount of slower memory.
Moving data between the CPU and GPU to run a data processing workload requires copying from the slower CPU memory to the GPU memory.
In an attempt to remove this memory bottleneck, Nvidia has unveiled its first datacentre processor, Grace, based on an Arm microarchitecture. According to Nvidia, Grace will deliver 10 times the performance of today’s fastest servers on the most complex AI and high-performance computing workloads. It supports the next generation of Nvidia’s coherent NVLink interconnect technology, which the company claims allows data to move more quickly between system memory, CPUs and GPUs.
Nvidia described Grace as a highly specialised processor targeting the largest data-intensive HPC and AI applications as the training of next-generation natural-language processing models that have more than one trillion parameters.
The Swiss National Supercomputing Center (CSCS) is the first organisation publicly announcing it will be using Nvidia’s Grace chip in a supercomputer called Alps, due to go online in 2023.
CSCS designs and operates a dedicated system for numerical weather predictions (NWP) on behalf of MeteoSwiss, the Swiss meteorological service. This system has been running on GPUs since 2016.
The Alps supercomputer will be built by Hewlett Packard Enterprise using the new HPE Cray EX supercomputer product line as well as the Nvidia HGX supercomputing platform, which includes Nvidia GPUs, its high-performance computing software developer’s kit and the new Grace CPU. The Alps system will replace CSCS’s existing Piz Daint supercomputer.
According to Nvidia, taking advantage of the tight coupling between Nvidia CPUs and GPUs, Alps is expected to be able to train GPT-3, the world’s largest natural language processing model, in only two days – 7x faster than Nvidia’s 2.8-AI exaflops Selene supercomputer, currently recognised as the world’s leading supercomputer for AI by MLPerf.
It said that CSCS users will be able to apply this AI performance to a wide range of emerging scientific research that can benefit from natural language understanding. This includes, for example, analysing and understanding massive amounts of knowledge available in scientific papers and generating new molecules for drug discovery.
“The scientists will not only be able to carry out simulations, but also pre-process or post-process their data. This makes the whole workflow more efficient for them,” said CSCS director Thomas Schulthess.