NVIDIA Prepares H100 NLV GPUs With More Memory and SLI-Like Capability

erek

[H]F Junkie
Joined
Dec 19, 2005
Messages
10,893
SLI rebranded making a comeback? That's a lot of interconnect bridges on top of the cards

"The performance differences between the H100 PCIe version and the H100 SXM version are now matched with the new H100 NVL, as the card features a boost in the TDP with up to 400 Watts per card, which is configurable. The H100 NVL uses the same Tensor and CUDA core configuration as the SXM edition, except it is placed on a PCIe slot and connected to another card. Being sold in pairs, OEMs can outfit their systems with either two or four pairs per certified system. You can see the specification table below, with information filled out by AnandTech. As NVIDIA says, the need for this special edition SKU is the emergence of Large Language Models (LLMs) that require significant computational power to run. "Servers equipped with H100 NVL GPUs increase GPT-175B model performance up to 12X over NVIDIA DGX A100 systems while maintaining low latency in power-constrained data center environments," noted the company."

tQaHS481udZ5uAhC.jpg


Source: https://www.techpowerup.com/306275/...gpus-with-more-memory-and-sli-like-capability
 
SLI (at least for gaming) is great in theory but fails in practice.

Split frame render modes generally scale poorly, making most use alternate frame render modes which have awful input lag.

This was best illustrated in the Toms Hardware review for the ATi Rage Fury MAXX 24 years ago:

286620_ehttp3A2F2Fimg.tomshardware.com2Fus2F19992F112F082Fpreview_of_the_double_whopper_2Flag.gif


That and constant compatibility struggles with titles requiring special profiles that often break both in Crossfire and SLI. I tried both and decided never again

I don't think we want it to come back. Besides, consumer boards rarely have sufficient PCIe lanes for it these days.

I suspect this is just for enterprise AI/Compute loads.
 
SLI (at least for gaming) is great in theory but fails in practice.

Split frame render modes generally scale poorly, making most use alternate frame render modes which have awful input lag.

This was best illustrated in the Toms Hardware review for the ATi Rage Fury MAXX 24 years ago:

286620_ehttp3A2F2Fimg.tomshardware.com2Fus2F19992F112F082Fpreview_of_the_double_whopper_2Flag.gif


That and constant compatibility struggles with titles requiring special profiles that often break both in Crossfire and SLI. I tried both and decided never again

I don't think we want it to come back. Besides, consumer boards rarely have sufficient PCIe lanes for it these days.

I suspect this is just for enterprise AI/Compute loads.
There are just way too many hurdles to getting SLi implemented by developers. The last time NVIDIA and AMD pushed multi-GPU, support got better for a short time and then fell off a cliff. It got worse over time, not better. The changes to DX12 would have allowed for great things but put the burden squarely on developers which killed multi-GPU practically overnight. I don't think we'll see SLi ever be a thing again until it can be implemented 100% in hardware where developers won't have to do anything.
 
Last edited:
There are just way too many hurdles to getting SLi implemented by developers. The last time NVIDIA and AMD pushed multi-GPU, support got better for a short time and then fell off a cliff. It got worse over time, not better. The changes to DX12 would have allowed for great things but put the burden squarely on developers which killed multi-GPU practically overnight. I don't think we'll see SLi ever be a thing again until it can be implemented 100% in hardware where developers won't have to do anything.

Couldn't agree more.

I always wondered why it was so difficult to just make multi-GPU transparent to the operating system at the hardware or at least the driver level.

That is really what will be necessary for mGPU to become worthwhile again. We have rendering API's, many of them. They are supposed to abstract the software/game from having to deal with hacks to make stuff like this work.

Somewhere in the chain of rendering API -> GPU driver - > GPU hardware it has to be possible to make this transparent to the title that is trying to render something, and unless that happens, I just don't see mGPU being relevant again.
 
I don't think this is anything new. We have a bunch of servers with A100s at work, 4 GPUs per node, and they are paired off with two GPUs having three bridges like shown here. I assume they are now being sold in linked pairs specifically rather than if you happen to have two in a system you have the option to do it.
 
  • Like
Reactions: erek
like this
Back
Top