Why not handle SLI/Crossfire in the driver or hardware and make it transparent to the application?

Concentric · Jan 4, 2021

My understanding of SLI and Crossfire is that the individual application (e.g. a game) is presented with separate GPUs and has to be coded to split its rendering tasks between them.
This means compatibility and performance is flaky, unpredictable, etc - some devs do it well, others don't, some don't bother, etc

I don't understand why Nvidia and AMD don't implement the load balancing of the GPUs themselves, inside the hardware and inside their driver, so that from the perspective of the application it doesn't need to know or care how many GPUs there are?
Is there a good reason why it's not done this way?

And before someone says "Who needs multi-GPU anyway, just buy a 3090", that's not the point. Asking a specific question because I'm interested in the technical reasons.

madpistol · Jan 4, 2021

This is how it was done before. Nvidia created SLI profiles which chose the best load balancing method for a given game and implemented that.

Nvidia and AMD have cut multi-GPU support largely because DirectX 12 gives devs close-to-metal access to the GPU. Because of this, developers can design a game to take advantage of multi-GPU setups without the need for a middleware-like approach of SLI or CrossfireX.

As for load balancing on GPU, this is where things get super tricky. Imagine one GPU is presented with a frame that is ready for rendering. If it's a single GPU, that's simple; as soon as the frame is done rendering, it is presented as a new frame to the monitor, and the GPU requests a new task (frame) for compute. It's linear and fairly easy to track.

With multi-GPU setups, it's a lot more difficult as you're now "hoping" that each frame will render at virtually exactly the same pace. Unfortunately, that's not always possible.

(This is a VERY ROUGH approximation of how multi-GPU works. I am not an electrical or computer engineer, so this is beyond an oversimplification of what is a very complex process.)

GPU0:
Requests data for new frame (0.1ms)
Frame Renders (7ms)
Frame presented to monitor (0.5ms)

GPU1: (Starts at 4ms)
Requests data for new frame (0.1ms)
Frame Renders (5ms)
Frame presented to monitor (0.5ms)

GPU0 Total Time to render: 7.6 ms
GPU1 (starts at 4ms) Total Time to render: 5.6ms (9.6ms effective)

HOWEVER, GPU1 is now requesting a new frame only 2ms after GPU0 started a new frame. If they both render at roughly the same speed on the next frame, you're going to feel a "stutter" due to inconsistent frame data requests.

Balancing load in a multi-GPU setup is tricky as you have to approximate when a new frame will be ready to be rendered, and sometimes that can leave a GPU idle for milliseconds at a time. This is why multi-GPU setups very rarely scale 100%; a lot of time is spent waiting for the appropriate time to request data for a new frame. Otherwise, it will cause stutter.

Hopefully that is clear as mud.

COTA · Jan 4, 2021

Implementing multigpu in DX12 is quite challenging and the market for it is really small, and since games are already quite complex and with big budgets, almost no developers are willing to put time and resources for multigpu.

Also, AFAIK mgpu is still not well supported in the most popular game engines like Unreal and Unity.

There's a glim of hope, though. Ray tracing and particularly path tracing is said to benefit greatly from mgpu with close to linear scaling.

There's also the holly grail of mgpu which is making multiple gpus appear as one. Which basically means an entire frame would be rendered by multple gpus at once.
AMD and nvidia have been rumored to be working on this for years, now Intel is supposed to be using this approach with Xe

Concentric · Jan 4, 2021

madpistol said:
Nvidia and AMD have cut multi-GPU support largely because DirectX 12 gives devs close-to-metal access to the GPU. Because of this, developers can design a game to take advantage of multi-GPU setups without the need for a middleware-like approach of SLI or CrossfireX.

But clearly devs are not doing a very consistent / reliable job of "taking advantage" of that - in fact some don't at all.
And surely even if DX12 gives them very low level access to the GPU, it still needs to go through the driver.
So I still feel like the driver would be a better method...?

I feel like nVidia and AMD should be able to see that they can't rely on developers doing this and should take on the task themselves.
In the current race, where margins are quite tight for the performance crown etc, would it not be great for them to be able to show that the ultimate [H]ard setup is to use a solid, reliable, performant multi-GPU setup built into their cards and drivers (regardless of the app/game), whilst their competitor has abandoned the idea and can't compete?

I see your point about the timing of the frames but surely they could improve this further by optimising the bridge between the cards, reducing latency between the GPUs, etc?
Like you say, not linear scaling but that's understable and something like 70% is still reasonable.

madpistol · Jan 4, 2021

COTA said:
Implementing multigpu in DX12 is quite challenging and the market for it is really small, and since games are already quite complex and with big budgets, almost no developers are willing to put time and resources for multigpu.

Also, AFAIK mgpu is still not well supported in the most popular game engines like Unreal and Unity.

There's a glim of hope, though. Ray tracing and particularly path tracing is said to benefit greatly from mgpu with close to linear scaling.

There's also the holly grail of mgpu which is making multiple gpus appear as one. Which basically means an entire frame would be rendered by multple gpus at once.
AMD and nvidia have been rumored to be working on this for years, now Intel is supposed to be using this approach with Xe

The only way I could see this working is through a line-buffer, where basically each GPU is presented with exactly the same data, GPU0 is told to render odd lines, and GPU1 is told to render even lines. However, this is really only beneficial for pixel processing, as each GPU still has to build the bones of the scene (triangles and tessellation). Only the shading would be improved, and without knowing the compute power required to render out the raw triangles, it would be difficult to know if this method would be truly viable.

COTA · Jan 4, 2021

madpistol said:
The only way I could see this working is through a line-buffer, where basically each GPU is presented with exactly the same data, GPU0 is told to render odd lines, and GPU1 is told to render even lines. However, this is really only beneficial for pixel processing, as each GPU still has to build the bones of the scene (triangles and tessellation). Only the shading would be improved, and without knowing the compute power required to render out the raw triangles, it would be difficult to know if this method would be truly viable.

You mean like SLI? (Scan Line Interleave)

madpistol · Jan 4, 2021

COTA said:
You mean like SLI? (Scan Line Interleave)

Good one.

Unfortunately, people would have to understand that scaling would be terrible. Frame-pacing would be excellent, though.

LukeTbk · Jan 4, 2021

Concentric said:
I don't understand why Nvidia and AMD don't implement the load balancing of the GPUs themselves, inside the hardware and inside their driver,

I think some of this is somewhat still possible (and what was done in the past), to use multiple GPU without by letting the driver handling things

https://www.khronos.org/registry/OpenGL/extensions/NVX/NVX_linked_gpu_multicast.txt

You have explicit and implicit multi GPU usage, i.e. some multi-gpu usage can occur without explicitly coding for it in OpenGL.

DX11 and 12 also had some implicit multi GPU usage I think:
https://www.anandtech.com/show/9740/directx-12-geforce-plus-radeon-mgpu-preview/2
In DirectX 12 there are technically three different modes for multi-adapter operation. The simplest of these modes is what Microsoft calls Implicit Multi-Adapter. Implicit Multi-Adapter is essentially the lowest rung of multi-adapter operation, intended to allow developers to use the same AFR-friendly techniques as they did with DirectX 11 and before. This model retains the same limited ability for game developers to control the multi-GPU rendering process, which limits the amount of power they have, but also limits their responsibilities as well. Consequently, just as with DirectX 11 mutli-GPU, in implicit mode much of the work is offloaded to the drivers (and practically speaking, AMD and NVIDIA).

But Vulkan/DX12 took the direction of giving more control/freedom (i.e. jobs to the developers) because I imagine it didn't work that well.

One aspect of making implicit multi-gpu hard over time is that more and more rendering of a current frame depend on the previous one for effect making solution like alternate frame rendering harder than in the past, that make some of the most obvious solution often impossible, same for rendering left the screen or right the screen and with vertex level shaders operation must be quite complicated.

You can have both GPU rendering the same frame at lower resolution (but one shifted just a little bit from one other) than using both result to merge them in a higher quality frame, but that too goes into conflict with many other out of screen/post processing affair and regular anti aliasing.

I think they are all working on this very hard (having simpler, smaller GPU and using more of them could make yield and production much simpler and cheaper), Sony next console for example:
https://www.freepatentsonline.com/20200242723.pdf

But if their patent entry is an indication, they really are not sure how it would work yet.

Dan_D · Jan 4, 2021

Effectively, AMD and NVIDIA haven't figured out a way to do multi-GPU on a purely hardware level independent of the OS or graphics API being used. With the introduction of DX12, multi-GPU support was shifted towards the game developers which is why it practically died right after DX12 was introduced. It wasn't massively popular or common to begin with.

GotNoRice · Jan 4, 2021

COTA said:
the market for it is really small

Quite frankly, I don't believe that the market for this is so tiny anymore, especially in our new reality where in many cases high-end GPUs are nowhere to be found even if you wanted to buy one. If there is really a crushing demand for $1500 cards like the 3090, you don't think people would jump at the chance to run a pair of 3070, 2080, etc, if it gave them similar performance? I mean, even if someone would have preferred the 3090, it's also nice to have an option that's actually in-stock.

I think it's more a matter of:

Nvidia/AMD doesn't make as much / any money if you double-up on your old card instead of buying a new one.
Nvidia/AMD makes more profit selling high-end cards instead of allowing people to double-up on budget or mid-tier cards.
They were happy to shed the responsibility of making working SLI/Crossfire profiles.
The "blissful ignorance" of assuming game developers would embrace all of this extra work instead, which they obviously didn't. It allowed Nvidia/AMD to pretend that they were trying to push the technology forward while they were actually pushing it into the trash can.

lopoetve · Jan 4, 2021

It's bloody hard. Forked multiprocessing of single applications is hard enough; something absurdly time sensitive like rendering? I wouldn't want to take that on unless I absolutely had to. ~shudder~

LukeTbk · Jan 4, 2021

GotNoRice said:
high-end GPUs are nowhere to be found

And with slow progress would not necessarily outperform 2x Previous generation GPU in most case (in raw avg fps).

It feel like it is something really really had, because for many of your point that would explain why not have individual GPU on different card, but even an easier scenario of multiple GPU working together without having to deal with PCI-express to talk to each other seem to be really hard as well and something console maker and many other could want to do (and are trying to do) without having to think about any of the possible balancing act. It is so hard (at least feel like it would) that they would do it a bit like for CPU, performance increase when not doing it become almost impossible, if we look at performance by watt increase and current cost they seem close and maybe why it seem to be ramping up with AMD announcement of chiplet GPU design, Sony trying to do it (I imagine with them) for the next console and so on, I imagine Apple will go that direction as well on SoC (putting a massive unique GPU make a big loss of the complete chips when it fail).

Spaghetti · Jan 4, 2021

LukeTbk said:
And with slow progress would not necessarily outperform 2x Previous generation GPU in most case (in raw avg fps).

It feel like it is something really really had, because for many of your point that would explain why not have individual GPU on different card, but even an easier scenario of multiple GPU working together without having to deal with PCI-express to talk to each other seem to be really hard as well and something console maker and many other could want to do (and are trying to do) without having to think about any of the possible balancing act. It is so hard (at least feel like it would) that they would do it a bit like for CPU, performance increase when not doing it become almost impossible, if we look at performance by watt increase and current cost they seem close and maybe why it seem to be ramping up with AMD announcement of chiplet GPU design, Sony trying to do it (I imagine with them) for the next console and so on, I imagine Apple will go that direction as well on SoC (putting a massive unique GPU make a big loss of the complete chips when it fail).

AMD and Nvidia (and other companies before them) have done multiple GPUs on one card in the past, but as far as I know, those cards have always essentially used the same SLI/CrossFire interface as two individual cards would. Even being on the same chip, the two GPUs are effectively working individually.

LukeTbk · Jan 4, 2021

Spaghetti said:
AMD and Nvidia (and other companies before them) have done multiple GPUs on one card in the past, but as far as I know, those cards have always essentially used the same SLI/CrossFire interface as two individual cards would. Even being on the same chip, the two GPUs are effectively working individually.

At a time when multi gpu work was much easier and like you said not particularly better implementation now (or different) than 2 different cards, even in that scenario they did not do what the op is talking about, suggesting to me that it is probably really hard to do and not mostly about fearing to lose sales, they could potentially gain so much yield and scaling ability by doing it that if they could it would be quite tempting to do.

Why not handle SLI/Crossfire in the driver or hardware and make it transparent to the application?

Concentric

[H]ard|Gawd

madpistol

[H]ard|Gawd

COTA

n00b

Concentric

[H]ard|Gawd

madpistol

[H]ard|Gawd

COTA

n00b

madpistol

[H]ard|Gawd

LukeTbk

Supreme [H]ardness

Dan_D

Extremely [H]

GotNoRice

[H]F Junkie

lopoetve

Extremely [H]

LukeTbk

Supreme [H]ardness

Spaghetti

n00b

LukeTbk

Supreme [H]ardness