AMD has revealed the improved Instinct MI200, which is aimed primarily at the US DOE’s Exascale program. The processor has incredible HPC performance as well as reasonable theoretical AI capabilities. AMD also hinted at an EPYC CPU update with a huge 3D stacked memory V-cache for higher HPC performance, as well as a cache coherent network that can connect four MI200s to the EPYC CPU. Let’s have a look at what we’ve got.
Technically, the business did not reveal these new platforms; rather, they showcased them in advance of SuperComputing ’21, which will be hosted in St. Louis from November 14 to 19. And, given that Oak Ridge National Labs is currently putting these chips in the HPE Cray Frontier Exascale machine, they felt compelled to comment. While certain information and price are yet unavailable, the HPC and AI communities will be pleased with the news.
Because the Frontier and El Capitan Exascale systems for the US DOE had already been awarded to HPE and AMD when the chip was still on the drawing board, the MI200 design team knew who their initial clients would be. While AMD accidentally revealed that the CPU will use 550 watts (! ), when compared to the NVIDIA A100, it will give roughly five times the 64-bit floating-point performance for HPC applications, thus performance per watt will still be amazing.
In terms of AI performance, AMD has narrowed the gap with NVIDIA, at least theoretically, thanks to 16-bit floating-point FLOPS that are 20 percent greater than the NVIDIA A100. Because AMD isn’t ready to release AI benchmark results like the MLPerf suite, we term it “theoretical.” And we doubt the Instinct software team will be able to improve AI models, kernels, and the ROCm development stack in a timely manner, so don’t hold your breath. However, we believe that the performance will entice developers to start building an AI ecosystem around AMD.

There are two more features worth mentioning. One is the new Infinity fabric’s speed and memory coherency, which connects the GPUs to MilanX and, presumably, Milan. Cache coherency makes memory management much easier for software developers, enhances application speed, and allows AI models with billions or trillions of parameters. The performance of these direct links will, of course, outperform PCIe-based GPU systems.
The Elevated Fan-out Bridge (EFB), which substitutes the standard silicon transposer, is the other new technology. EFB offers greater scalability and cheaper costs by utilizing typical “flip chip” assembly procedures.