Bring back "SLI"?

Dan_D · May 12, 2025

Mega6 said:
SLI shitcaned for a big reason - microsiutter.

That literally had nothing to do with it. Or very little at best. It primarily came down to studios being unwilling to implement multi-GPU support under the new DirectX 12 model. With the bulk of the work shifting from the GPU manufacturer's to game studios, support for SLI effectively died over night. Not everyone was sensitive to microstutter and on the ultra high end, (GTX 1080Ti, RTX 2080Ti timeframe) it was the only way to get max settings in games at 4K with playable frame rates in newer games.

Krypton · May 12, 2025

I don't think any of us would be engaged in this discussion if Nvidia would quite releasing cards starved for memory (8GB), then decreasing the bus width on the next generation because they went to DDR7 LOL.

Memory bus should be at 1TB second min by now even for 60 series. So much sand wasted... I am in a boycott state of mind, but I guess the populace is used to "shrinkflation" even with GPUs.. and they are banking on the sentiment.

Brackle · May 12, 2025

Krypton said:
I don't think any of us would be engaged in this discussion if Nvidia would quite releasing cards starved for memory (8GB), then decreasing the bus width on the next generation because they went to DDR7 LOL.

Memory bus should be at 1TB second min by now even for 60 series. So much sand wasted... I am in a boycott state of mind, but I guess the populace is used to "shrinkflation" even with GPUs.. and they are banking on the sentiment.

Well even if you used 2 8GB GPU's in SLI, you are still limited to a 8gb memory buffer as they do not stack.

8GB should be for the super low end gpu's now like the 5050 or 9050.

CraptacularOne · May 12, 2025

Krypton said:
I don't think any of us would be engaged in this discussion if Nvidia would quite releasing cards starved for memory (8GB), then decreasing the bus width on the next generation because they went to DDR7 LOL.

Memory bus should be at 1TB second min by now even for 60 series. So much sand wasted... I am in a boycott state of mind, but I guess the populace is used to "shrinkflation" even with GPUs.. and they are banking on the sentiment.

Not sure what you're on about here for anything in this post but VRAM does not stack in SLI or CFX configurations. Everything that's in one GPU's memory must also be mirrored in the other GPU's memory otherwise they cannot work on the same scene at the same point in time. Doesn't matter if you had Quad SLI, you are still limited by the VRAM pool of a single GPU in the array. As for memory and bus width and memory speed you seem to also not understand how it actually works. The previous generation top end RTX 4090 had a 384bit bus true but with it's GDDR6X the bandwidth was rated at 1001GB/s. The current gen not top end 5080 as an example has a 256bit wide bus but with it's much faster GDDR7 at 30Gbps is good for 960GB/s bandwidth virtually equaling the bandwidth between the 384 and 256bit memory interfaces.

pendragon1 · May 12, 2025

i see what hes saying about the bus speeds. we got up to 512bit then went backwards. but there is probably technical reasons i gont get..

Krypton · May 12, 2025

CraptacularOne said:
Not sure what you're on about here for anything in this post but VRAM does not stack in SLI or CFX configurations. Everything that's in one GPU's memory must also be mirrored in the other GPU's memory otherwise they cannot work on the same scene at the same point in time. Doesn't matter if you had Quad SLI, you are still limited by the VRAM pool of a single GPU in the array. As for memory and bus width and memory speed you seem to also not understand how it actually works. The previous generation top end RTX 4090 had a 384bit bus true but with it's GDDR6X the bandwidth was rated at 1001GB/s. The current gen not top end 5080 as an example has a 256bit wide bus but with it's much faster GDDR7 at 30Gbps is good for 960GB/s bandwidth virtually equaling the bandwidth between the 384 and 256bit memory interfaces.

My point is we would not consider wanting "SLI" or something similar if Nvidia wasn't handicapping memory (GB total and bandwidth) with new releases to begin with... 12GB / 1TB bandwidth min for 60 series is what we want , but at this rate we are 5 years away.. Just give us what we want with 1 GPU.

CraptacularOne · May 12, 2025

Krypton said:
My point is we would not consider wanting "SLI" or something similar if Nvidia wasn't handicapping memory (GB total and bandwidth) with new releases to begin with... 12GB / 1TB bandwidth min for 60 series is what we want , but at this rate we are 5 years away.. Just give us what we want with 1 GPU.

Memory capacity, sure the more the merrier to a certain point of diminishing returns but bandwidth isn't as important as capacity. Just look at RX 9070 XT and it's GDDR6 at 644GB/s vs a RTX 5070 Ti and it's 896GB/s for example. Despite the RTX 5070 Ti having significantly higher memory bandwidth, the 2 GPUs are very evenly matched. This is regardless of the more than 250GB/s bandwidth advantage the RTX 5070 Ti has. We had your proposed 12GB roughly 1TB/s with the RTX 3080 Ti 5 years ago and it routinely get's outclassed by a RTX 5070 and it's 192bit and 672GB/s bandwidth now. The point I'm trying to highlight for you is that it really doesn't matter, there is no hard and fast "this is the bar we need to set" for bus width and bandwidth.

ShuttleLuv · May 12, 2025

My SLI worked great no stutters when set properly, devs just stopped caring about it.

LukeTbk · May 12, 2025

Krypton said:
My point is we would not consider wanting "SLI" or something similar if Nvidia wasn't handicapping memory (GB total and bandwidth) with new releases to begin with... 12GB / 1TB bandwidth min for 60 series is what we want , but at this rate we are 5 years away.. Just give us what we want with 1 GPU.

I am not sure how that work, memory handicap seem more a reason to want to higher the single GPU stack than SLI from the middle of pack (as it does not really double your memory to go sli like going for the big single gpu would), Nvidia certainly with the 3090-4090-5090 offer that all you want in 1 GPU option.

pendragon1 said:
i see what hes saying about the bus speeds. we got up to 512bit then went backwards. but there is probably technical reasons i gont get..

The bigger the memory controller the more die space you use for it , using that die space for cache (or more core instead or save money with a smaller die) instead could be leading to better result, same for desktop cpu we have not seen much wider bus going on for them as well, as data compression and caching strategy get better it can push the balance in that direction over time, it has been a while since we saw a large amount of performance on the table because of too little memory bandwith, we can assume they are balancing act as best as it is possible (hard to imagine why they would not)

CraptacularOne · May 12, 2025

pendragon1 said:
i see what hes saying about the bus speeds. we got up to 512bit then went backwards. but there is probably technical reasons i gont get..

2 reasons really:
1) GPU die size
2) memory speed

A wider data path physically requires a larger GPU die to actually wire the memory to it. That creates complexity and is obviously more costly for the PCB. Furthermore as memory speeds have steadily risen it has negated the need for super wide data paths to achieve higher and higher throughput.

LukeTbk · May 12, 2025

CraptacularOne said:
2 reasons really:

I think caching and compression also got better, RDNA 2 performance vs Ampere despite the significantly lower bandwith could have been the bigger cache, model Nvidia followed with Lovelace.

CraptacularOne · May 12, 2025

LukeTbk said:
I think caching and compression also got better, RDNA 2 performance vs Ampere despite the significantly lower bandwith can have been the bigger cache, model Nvidia followed with Lovelace.

For sure having a larger on-die cache to reduce memory calls can have a significant performance impact just look at how the "X3D" AMD CPUs have virtually taken over for gaming rigs. But that's not really why we saw memory bus width's recess like he was asking.

LukeTbk · May 12, 2025

CraptacularOne said:
But that's not really why we saw memory bus width's recess like he was asking.

Not sure AMD reduce it memory bus if that did not happen, would the 9070xt design team choose 256 bit on a little bit better than 6 years old gddr6 tech without the advance in caching ?

Decko · May 12, 2025

Krypton said:
While everyone is playing the generational GPU swap game and Nvidia / AMD are raking in great profits with minor incremental updates that always leave you wanting more and a little frustrated...

This guy did a "DIY SLI" with interesting results....

View: https://www.youtube.com/watch?v=PFebYAW6YsM&ab_channel=Lecctron

Really fun experiment, it could be practical in some cases for single player games. 11x framegen still have to feel odd but look very smooth, overall very cool.

CraptacularOne · May 12, 2025

LukeTbk said:
Not sure AMD reduce it memory bus if that did not happen, would the 9070xt design team choose 256 bit on a little bit better than 6 years old gddr6 tech without the advance in caching ?

Yes. Their reason for choosing GDDR6 and not GDDR7 was solely cost motivated to be "disruptive" in the GPU space in order to cost down their product. This also goes in line with my first reason GPU die size and the added cost and complexity needed for wider data paths. And they did in fact do this when they went from the HD 2900 XT and it's 512bit data path to the HD 3870 and it's 256bit data path. They both shared a 256KB L2 cache.

LukeTbk · May 12, 2025

CraptacularOne said:
Yes. Their reason for choosing GDDR6 and not GDDR7 was solely cost motivated to be "disruptive" in the GPU space in order to cost down their product.

But that a choice you can make only if the performance stay high enough with 256bits-gddr6, which we can imagine would not have been the case without compression-caching advancing so much, I am not sure how possible it would be for someone to know something like this (why not have went with 128 bits this time around if 256bits with 2016 cache-compression tech would have been enough for today needs)

You can make the chips for the same cost with 384 bits instead, make it rectangular enough, less cache to fit the core and so on, no ?

You balance cost, power, performance, how big of a memory bus will be optimal at cost X and power Y, will depends on compression-cache and memory technology.

they always, everytime try to be disruptive and make the cheapest possible gpu they can, that just narrative this time around (and looking at the price today on newegg.... almost purely narrative).

CraptacularOne · May 12, 2025

LukeTbk said:
But that a choice you can make only if the performance stay high enough with 256bits-gddr6, which we can imagine would not have been the case without compression-caching advancing so much, I am not sure how possible it would be for someone to know something like this (why not have went with 128 bits this time around if 256bits with 2016 cache-compression tech would have been enough for today needs)

You can make the chips for the same cost with 384 bits instead, make it rectangular enough, less cache to fit the core and so on, no ?

You balance cost, power, performance, how big of a memory bus will be optimal at cost X and power Y, will depends on compression-cache and memory technology.

they always, everytime try to be disruptive and make the cheapest possible gpu they can, that just narrative this time around (and looking at the price today on newegg.... almost purely narrative).

The caching and decompression didn't change all that much architecturally between RDNA 3 and 4 (it did improve but not by leaps and bounds). What did change is that they are no longer using chiplet dies for the memory controllers. It's all a monolithic design again after the woes that happened with RDNA 3 and it's complicated chiplets that tied into the main compute die of the GPU. Memory latency went down in RDNA 4. Sure you can design the GPU anyway you want and if they wanted a 384bit design they could have modified the shape of the GPU. That however incurs a cost as I said in PCB complexity and the need for more memory chips that cost more. That goes in the face of them wanting to cost down their GPU. Had they released high end RDNA 4 I'm sure we would have seen a 384bit bus width but this again goes back to me saying that there is no real "hard and fast threshold" for memory bus width. If they could achieve the memory throughput they needed on a 128bit bus I can tell you they would have done it purely from a cost perspective. It's not currently possible with even with GDDR7 and a 128bit bus as the current theoretical max we can get on a 128bit bus and the fastest available GDDR7 (30Gbps) is 480GB/s

zandor · May 12, 2025

pendragon1 said:
i see what hes saying about the bus speeds. we got up to 512bit then went backwards. but there is probably technical reasons i gont get..

2003-2008 NV used GDDR3. By the end the bus width hit 512-bits in the GTX 280 and 285 (refresh model). AFAIK that's the last time bus width reached 512-bits on a consumer NV card, including Titans, until the 5090 - unless you count dual GPU models as 2x. The next generation, the GTX 400 series, switched to GDDR5, bus width dropped to 384-bits on the 4080, bandwidth increased, and vram went up by 50%.

IMHO that was a more sensible way to do things than what we've been getting lately. Shrinking the bus width one notch and adding vram when they get a new memory type, then increasing the bus width and adding vram until the next new memory type comes along makes sense to me. OTOH maybe we don't really need the extra bandwidth as much? Yeah I'm looking at that lousy generational uplift from the 40 series to the 50 series, and the 40 series still being faster than the 30 series despite the bus getting shrunk and a lot of the models having less memory bandwidth than their predecessor.

WilyKit · May 12, 2025

1.1.2.3.5... · May 12, 2025

zandor said:
2003-2008 NV used GDDR3. By the end the bus width hit 512-bits in the GTX 280 and 285 (refresh model). AFAIK that's the last time bus width reached 512-bits on a consumer NV card, including Titans, until the 5090 - unless you count dual GPU models as 2x. The next generation, the GTX 400 series, switched to GDDR5, bus width dropped to 384-bits on the 4080, bandwidth increased, and vram went up by 50%.

IMHO that was a more sensible way to do things than what we've been getting lately. Shrinking the bus width one notch and adding vram when they get a new memory type, then increasing the bus width and adding vram until the next new memory type comes along makes sense to me. OTOH maybe we don't really need the extra bandwidth as much? Yeah I'm looking at that lousy generational uplift from the 40 series to the 50 series, and the 40 series still being faster than the 30 series despite the bus getting shrunk and a lot of the models having less memory bandwidth than their predecessor.

Probably a function of the workloads. Some GPGPU jobs are very memory intensive and the bandwidth is never enough.

zandor · May 12, 2025

1.1.2.3.5... said:
Probably a function of the workloads. Some GPGPU jobs are very memory intensive and the bandwidth is never enough.

I'm just talking about gaming with the 40 to 50 and 30 to 40 series uplifts, but yes you're right there are memory bound workloads.

LukeTbk · May 12, 2025

CraptacularOne said:
The caching and decompression didn't change all that much architecturally between RDNA 3 and 4

Sure but RDNA 3 was not 512 bits either we are talking over a long amount of time, the big step to keep bus small could have been with RDNA 2 here.

Has for dynamic register allocation:
https://chipsandcheese.com/p/dynamic-register-allocation-on-amds?utm_campaign=post&utm_medium=web
Out of order memory access:
https://chipsandcheese.com/p/rdna-4s-out-of-order-memory-accesses
Still, AMD's engineers deserve credit for making them happen. RDNA 4’s arguably makes the most significant change to AMD’s GPU memory subsystem since RDNA launched in 2019.

Better compression of the BVH

And other trick to minimize need for memory bandwidth being not a big or a big one I could not say, but that seem to be a lot of the work they do and it is possible that without the last decade of iteration in that memory management and tricks the 9070xt could not have competed with Nvidia 5070ti with that much of a memory bandwith deficit.

CraptacularOne · May 12, 2025

LukeTbk said:
Sure but RDNA 3 was not 512 bits either we are talking over a long amount of time, the big step to keep bus small could have been with RDNA 2 here.

Has for dynamic register allocation:
https://chipsandcheese.com/p/dynamic-register-allocation-on-amds?utm_campaign=post&utm_medium=web
Out of order memory access:
https://chipsandcheese.com/p/rdna-4s-out-of-order-memory-accesses
Still, AMD's engineers deserve credit for making them happen. RDNA 4’s arguably makes the most significant change to AMD’s GPU memory subsystem since RDNA launched in 2019.

Better compression of the BVH

And other trick to minimize need for memory bandwidth being not a big or a big one I could not say, but that seem to be a lot of the work they do and it is possible that without the last decade of iteration in that memory management and tricks the 9070xt could not have competed with Nvidia 5070ti with that much of a memory bandwith deficit.

You're quoting only part of what I'm saying and taking it out of context or just not understanding it entirely it seems. The comparison I made was to show you that they have indeed halved the bus width without any increase to cache size and still delivered a faster product with less memory bandwidth in the past. They have set a precedent for it was all that was to illustrate. As for how the 9070 XT vs RTX 5070 Ti the truth of the matter is that the RTX 5070 Ti isn't bandwidth starved.

You're in luck because I just so happen to have a RTX 5070 Ti on hand for testing in a secondary system of mine.

This is a RTX 5070 Ti with the memory downclocked to 27Gbps = 75.66fps
https://www.3dmark.com/sw/2252659

this is the same card but memory overclocked to 30Gbps = 77.52fps
https://www.3dmark.com/sw/2252668

There isn't even a 2fps difference despite being a super heavy ray tracing workload like 3Dmark Speedway with a 3Gbps deficit clock speed and a more than 160GB/s bandwidth loss. Memory speed isn't everything nor is bandwidth. It's having enough of each to effectively accomplish it's tasks. That's the whole point I'm trying to make to you and everyone here. You could give the RX 9070 XT GDDR7 memory and it wouldn't dramatically improve it's performance. Sure it will be higher but not meaningfully so and certainly not cost effectively so.

LukeTbk · May 12, 2025

CraptacularOne said:
s that the RTX 5070 Ti isn't bandwidth starved.

Of course ? Nor is the 9070xt I imagine, in part because of good their cache, compression strategy-decompression hardware and memory management got over time.

CraptacularOne said:
You're quoting only part of what I'm saying and taking it out of context or just not understanding it entirely it seems

I feel you just want to disagree but that in reality you agree at 100% that a reason why we see 256 bits having the performance on a 9070xt possible is because caching and compression also got better, which made possible to have a small bus (and a smaller side of die), that a third reason why RDNA has been able to compete with lower bandwith (versus Nvidia in their previous generation of card).

Without decade of innovation of that regard, I doubt 9070xt performing cards with 256bits-gddr6 would be possible.

CraptacularOne said:
You could give the RX 9070 XT GDDR7 memory and it wouldn't dramatically improve it's performance.

Exactly... ?

I know about nothing and I could be all wrong, but if the infinity cache (caching in general and compression) of RDNA 2 that achieved parity performance despite its large bandwidth deficit with Ampere has nothing to do with the trend of favoring cache and others things taking priority on the GPU budget they must divide, I would be surprised.

Haswellbeast · May 13, 2025

1. That video is click bait, like the rest of that guy's channel, and 2. Even if sli came back, who is going to run DUAL 600w rtx 5090s??- not many people in north America (with our 120v wall power)

daglesj · May 13, 2025

Bring back proper PCIe slots and another 8 - 12 lanes first.

Krypton · May 14, 2025

Meanwhile from the rumor mill:

https://www.tomshardware.com/pc-com...portedly-in-the-works-computex-reveal-rumored

xDiVolatilX · May 14, 2025

I used SLI almost every generation it was available. It was fun. It would help hitting 144hz on at the time demanding games. Although the scaling loss was always an issue. You almost never truly doubled your GPU power for twice the GPU.

sc5mu93 · May 14, 2025

I wouild be open to crossfire again. maybe these new AI companies (nvgeedia, AMD) could address microstutter with AI.

I always thought AMD chiplet design was to create a better scaling per compute core solution. kinda like a modern Crossfire.

horrorshow · May 14, 2025

I remember when my buddy bought dual 6600 GT's for SLI back in the early 2000's/college era.... That was truly impressive for the $

sc5mu93 · May 14, 2025

horrorshow said:
I remember when my buddy bought dual 6600 GT's for SLI back in the early 2000's/college era.... That was truly impressive for the $

my college era we the 90s. I remember REAL SLI with 3dfx from that period.

Armenius · May 15, 2025

Brackle said:
Until you realize that you are at the mercy of Nvidia or AMD to release new profiles for games. It isn't fun having a good SLI system to see 1 GPU utilization at 1-2% while your other card is at 99% ......Then praying there isn't any microstutter.

As someone who has been using SLI since the 3dfx days. I am VERY happy its dead.

Yeah, when I went back to a single powerful card after 8'ish years I was shocked by how much smoother the experience was. I got desensitized to the frame pacing issues of AFR.

Armenius · May 15, 2025

pendragon1 said:
i see what hes saying about the bus speeds. we got up to 512bit then went backwards. but there is probably technical reasons i gont get..

You can have 512 bit if you're okay with having 16 memory chips on a 5080 instead of 8 and the extra cost that comes with it. The memory bus is made up of multiple 32-bit channels. Density and speed has gone up, so has the need for a wide bus gone down.

LukeTbk · May 15, 2025

Armenius said:
You can have 512 bit if you're okay with having 16 memory chips on a 5080 instead of 8 and the extra cost that comes with it. The memory bus is made up of multiple 32-bit channels. Density and speed has gone up, so has the need for a wide bus gone down.

Don't you need a die big enough to have that much memory controller, 3 edge of the 5090 are fully used for it, it is possible that they made the 5090 the smallest they could for gddr7 speed, as bandwith increase the width for that dataline can increase and the minimal die size you end up needing for X bits (not that there was ever in the past 5080 die sized 512 bits gpu I do not think).

pendragon1 · May 15, 2025

yeah yeah, i get it. back to sli

sc5mu93 said:
my college era we the 90s. I remember REAL SLI with 3dfx from that period.

real sli on voodoo 2s was the shit back then

Bring back "SLI"?

Cat Can't Scratch It

[H]F Junkie

Old Timer

Limp Gawd

Cat Can't Scratch It

[H]F Junkie

Limp Gawd

Supreme [H]ardness

[H]F Junkie

Limp Gawd

[H]F Junkie

Limp Gawd

[H]F Junkie

2[H]4U

Limp Gawd

[H]F Junkie

Limp Gawd

Supreme [H]ardness

[H]ard|Gawd

2[H]4U

Supreme [H]ardness

[H]F Junkie

Limp Gawd

[H]F Junkie

Gawd

Supreme [H]ardness

[H]F Junkie

2[H]4U

[H]ard|Gawd

Lakewood Original

[H]ard|Gawd

Extremely [H]

Extremely [H]

[H]F Junkie

Cat Can't Scratch It