Nvidia's new architecture is here: Blackwell, just announced by Nvidia CEO Jensen Huang at GTC, and it comes on the dreadfully huge B200 GPU. However, calling this a "GPU" is technically incorrect. This is a dual GPU package with a total of 208 billion transistors. To put this in perspective, Nvidia's previous must-have compute chips, the Hopper H200 and H100, have just over 80 billion transistors; the RTX 4090 has 76.3 million; Blackwell has more than twice that, which makes a dual GPU and new chip-to-chip interconnect makes a lot of sense.
Blackwell is unfortunately not for gaming. Boo. In any case, we don't know if our bank accounts can handle such a mighty thing; Blackwell is primarily intended for deployment within data centers that are chasing larger compute figures. Why, because it is mostly artificial intelligence.
But as we await news on the next generation of GeForce graphics cards, let's keep in mind which features will be ported from these massive Blackwell chips to whatever architecture is coming out in the next gaming graphics cards Let's keep that in mind.
Let's start with what we will probably see in GeForce GPUs in the future: Blackwell features new 5th generation tensor cores. These are accelerators for the instructions used primarily in AI applications, namely inference and training, and the 5th generation version is set to deliver up to a 30x performance boost. The new tensor core includes updates to the High Precision Format and the Transformer Engine, first introduced in Hopper, to accelerate inference and training of large language models. [GeForce cards use Tensor Cores for features such as DLSS, and the fourth generation of Tensor Cores makes the leap from Nvidia's enterprise-only Hopper architecture to the Ada Lovelace architecture driving the RTX 40 series We have already seen the same thing happen in the next generation, and we will likely see the same thing happen in the next generation. However, the key will be how these additional features are leveraged by Nvidia. A new DLSS version or Frame Generation feature would be a candidate for further development.
Below, we will flip through the gallery to show you step-by-step how each part of the Blackwell package fits together.
The Blackwell B200 and B100 appear to be made from the same cloth; the B200 has 40TFLOPs FP within the larger HGX B200 system (consisting of eight B200 GPUs) than the B100 within the HGX B100 system (consisting of eight B100 GPUs) 64 vs. 30TFLOPs FP64, but the performance seems close enough to estimate that both B200 and B100 were built from the same mammoth package of 208 billion transistors.
"It's okay, Hopper," Huang says as he places the massive Blackwell package next to the Hopper die.
The Blackwell chip has two GPUs that appear to be running as a single chip, and each GPU is manufactured at the maximum size possible for a single chip in any given lithography process, called the reticle limit. Huang stated in his keynote that "there is a small line between the two dies, and this is the first time two dies have touched each other in such a way that you would think they were a single chip."
Nvidia has played with splitting GPUs before; the Ampere GA100 GPU was more or less split in two by an interconnect, although the actual silicon was not. Blackwell has two properly sliced silicon halves, taking it a giant leap further.
The likelihood of the same dual-GPU design being used in a gaming graphics card is fairly low, but not entirely impossible.
First, leaked information suggests that the largest graphics card in the next-generation Nvidia lineup, perhaps the RTX 5090, will have more CUDA cores than its predecessor, the RTX 4090. However, current rumors do not suggest anything that would directly double the RTX 4090's specs; even if the RTX 5090 used two smaller GPUs to make it a more efficient chip, there are more concrete reasons why a dual-GPU approach could be unrealistically tricky .
It is very difficult to make two GPUs work as a single GPU during a game. For this multi-GPU approach to work, the two GPUs need to act as one GPU and at the same time require very few changes to the API that communicates with the graphics card.
Huang said of the way the Blackwell GPU package works as one, "The two sides of the Blackwell chip don't know which side they are on. There's no memory locality issue, no cache issue." This is just one giant chip," Huang continued.
This gives me hope that we will reach a point where a multi-GPU gaming die is possible, but still hard to crack the cookie; with enough bandwidth between the two dies, it would be easy to achieve with a compute chip like the B200.
For now, the latest process nodes are likely to be the main way for Nvidia to pack more cores into its gaming chips. And that is one area where Blackwell suggests what we should expect.
Above: Nvidia has prototype boards for very powerful (and expensive) systems using Blackwell.
What may see a leap from Blackwell to next generation gaming cards is the use of TSMC's 4NP process node. This is reportedly an extension of the custom 4N process node created and used exclusively for Nvidia's Ada Blackwell and Hopper chips. However, it is not actually a 4nm process node, but closer to TSMC's 5nm node. Confusingly, this seems to be intentional, as almost every major semiconductor manufacturer does the same thing. For example, Intel 7 is actually a 10nm process. The point is that it is very likely that we will also see next generation GeForce cards using the 4NP process. [Blackwell's decompression engine is of particular interest to gamers, as Nvidia introduced RTX IO in 2020 as a way to shift the load from the CPU to the GPU to reduce load times for gaming assets. This is part of a broader industry push to integrate GPU decompression into games, including AMD's SmartAccess Storage, Microsoft's DirectStorage, and Khronos Group's Vulkan API. All of these rely on an open GPU compression standard called GDeflate.
Blackwell's new Decompression Engine speeds up GDeflate, among other decompression standards, and if integrated into next-generation GeForce GPUs, could help in the broader push to adopt GDeflate for games If GPUs can decompress assets faster, they can load them into games faster, which means that more detailed game worlds can be designed with reasonable performance expectations.
There are several areas where Blackwell is currently unlikely to be included in future gaming GPUs. There is the RAS (Reliability, Availability, and Serviceability) engine, built to proactively identify and report failures and potential failures; if you have hundreds of thousands of GPUs running at once, as Meta does, this is a very useful feature. Similarly, a focus on a TEE-I/O security model for "secure AI" would not necessarily be on the agenda for GeForce, as the ability to use NVLink to combine large numbers of GPUs into a single superchip, or many superchips into a single supersystem, would also be a very useful feature for game sacrifice. The ability to combine will also be thrown into the game's sacrificial fire.
Finally, the RTX 50 series or similar gaming chips will not have hundreds of gigabytes of HBM3e memory. Nvidia's Grace Blackwell Superchip, which incorporates two Blackwell GPUs and a Grace CPU, looks great (and expensive) with 384GB of HBM3e memory providing 16TB/s of bandwidth, but we probably will be gaming with 8GB or more (hopefully more) of GDDR7 memory.
Here are some graphics goodies that could make the leap from Blackwell to gaming graphics cards. Unfortunately, we don't know for sure when; Nvidia has yet to give a firm date as to when the next generation of gaming graphics cards will be available. Nor do we know when Blackwell-based products will be available. But the company has little need to advertise: Meta, Google, Microsoft, OpenAI, Oracle, xAI, Dell, and Amazon are already in line as customers seeking Blackwell.
If we follow the previous generation of announcements, with Hopper announced in March and Ada Lovelace in September, we may see more details on the next generation of GeForce graphics cards soon after the summer.
.
Comments