NVIDIA Prepares H100 NLV GPUs With More Memory and SLI-Like Capability

erek · Mar 22, 2023

SLI rebranded making a comeback? That's a lot of interconnect bridges on top of the cards

"The performance differences between the H100 PCIe version and the H100 SXM version are now matched with the new H100 NVL, as the card features a boost in the TDP with up to 400 Watts per card, which is configurable. The H100 NVL uses the same Tensor and CUDA core configuration as the SXM edition, except it is placed on a PCIe slot and connected to another card. Being sold in pairs, OEMs can outfit their systems with either two or four pairs per certified system. You can see the specification table below, with information filled out by AnandTech. As NVIDIA says, the need for this special edition SKU is the emergence of Large Language Models (LLMs) that require significant computational power to run. "Servers equipped with H100 NVL GPUs increase GPT-175B model performance up to 12X over NVIDIA DGX A100 systems while maintaining low latency in power-constrained data center environments," noted the company."

Source: https://www.techpowerup.com/306275/...gpus-with-more-memory-and-sli-like-capability

Zarathustra[H] · Mar 22, 2023

SLI (at least for gaming) is great in theory but fails in practice.

Split frame render modes generally scale poorly, making most use alternate frame render modes which have awful input lag.

This was best illustrated in the Toms Hardware review for the ATi Rage Fury MAXX 24 years ago:

286620_ehttp3A2F2Fimg.tomshardware.com2Fus2F19992F112F082Fpreview_of_the_double_whopper_2Flag.gif

That and constant compatibility struggles with titles requiring special profiles that often break both in Crossfire and SLI. I tried both and decided never again

I don't think we want it to come back. Besides, consumer boards rarely have sufficient PCIe lanes for it these days.

I suspect this is just for enterprise AI/Compute loads.

Dan_D · Mar 22, 2023

Zarathustra[H] said:
SLI (at least for gaming) is great in theory but fails in practice.

Split frame render modes generally scale poorly, making most use alternate frame render modes which have awful input lag.

This was best illustrated in the Toms Hardware review for the ATi Rage Fury MAXX 24 years ago:

That and constant compatibility struggles with titles requiring special profiles that often break both in Crossfire and SLI. I tried both and decided never again

I don't think we want it to come back. Besides, consumer boards rarely have sufficient PCIe lanes for it these days.

I suspect this is just for enterprise AI/Compute loads.

There are just way too many hurdles to getting SLi implemented by developers. The last time NVIDIA and AMD pushed multi-GPU, support got better for a short time and then fell off a cliff. It got worse over time, not better. The changes to DX12 would have allowed for great things but put the burden squarely on developers which killed multi-GPU practically overnight. I don't think we'll see SLi ever be a thing again until it can be implemented 100% in hardware where developers won't have to do anything.

Zarathustra[H] · Mar 22, 2023

Dan_D said:
There are just way too many hurdles to getting SLi implemented by developers. The last time NVIDIA and AMD pushed multi-GPU, support got better for a short time and then fell off a cliff. It got worse over time, not better. The changes to DX12 would have allowed for great things but put the burden squarely on developers which killed multi-GPU practically overnight. I don't think we'll see SLi ever be a thing again until it can be implemented 100% in hardware where developers won't have to do anything.

Couldn't agree more.

I always wondered why it was so difficult to just make multi-GPU transparent to the operating system at the hardware or at least the driver level.

That is really what will be necessary for mGPU to become worthwhile again. We have rendering API's, many of them. They are supposed to abstract the software/game from having to deal with hacks to make stuff like this work.

Somewhere in the chain of rendering API -> GPU driver - > GPU hardware it has to be possible to make this transparent to the title that is trying to render something, and unless that happens, I just don't see mGPU being relevant again.

Cypher- · Mar 22, 2023

I don't think this is anything new. We have a bunch of servers with A100s at work, 4 GPUs per node, and they are paired off with two GPUs having three bridges like shown here. I assume they are now being sold in linked pairs specifically rather than if you happen to have two in a system you have the option to do it.

NVIDIA Prepares H100 NLV GPUs With More Memory and SLI-Like Capability

erek

[H]F Junkie

Zarathustra[H]

Extremely [H]

Dan_D

Extremely [H]

Zarathustra[H]

Extremely [H]

Cypher-

[H]ard|Gawd