PCIE 3 limits founds

If you look at his chart around 6:30 on the video, it shows the fps on both 8x and 16x on a single GPU is the exact same (37.6 fps on both), which shows that the limits have not truly been reached in the classical sense.
Any generation of PCI-E will show limitations when using SLI without a crossbar (I know the Titan V GPUs don't support it), but it was like this back when 6800GT GPUs were all the rage with PCI-E 1.0 as well when a crossbar wasn't used, and the fps would dip quite a bit.

I don't want to say this is "cheating" to get to the PCI-E 3.0 limitation, but it is a bit of an artificial way of doing so on any GPUs (other than the Titan V GPUs since they legitimately don't have the capability).
When a single GPU shows a fps drop between 16x to 8x, then that will show when we have truly reached the limits of that generation of PCI-E, at least with gaming benchmarks, but not necessarily scientific/compute applications, though.

Nice video, though, it was very informative.
 
When a single GPU shows a fps drop between 16x to 8x, then that will show when we have truly reached the limits of that generation of PCI-E, at least with gaming benchmarks, but not necessarily scientific/compute applications, though.

When it comes to impacts on scientific/compute use, the efficiency of the apps is at least as important as the hardware stack. The early Binary Radio Pulsar applications for the Einstein@Home project suffered IIRC 20-30% slowdowns on what were then high end GPUs going from PCIe2.0 x16 to 2.0x8 or 1.0x16. As future versions of the applications progressively became more efficient (primarily by going from using the GPU as a coprocessor only for the inner hotloops to doing almost everything except disk IO on the GPU) the penalty from lower PCIe bandwidth gradually withered away.
 
there are already games that in single GPU configurations are very sensitive to PCI-E bandwidth even at low resolutions and are really old games.. all depend on how really they are programmed.

bf4_1920_1080.gif


grid2_1920_1080.gif

grid2_1600_900.gif


ryse_1920_1080.gif


ryse_2560_1440.gif


wolfenstein_1920_1080.gif


wolfenstein_2560_1440.gif


wow_1920_1080.gif


wow_2560_1440.gif
 
Yeah, we are finally starting to get to the 5% performance impact mark for video. Still below the 10% mark (where you would notice), but we seem to be getting closer to that point. Another year or two, at this rate.

PCIe 3.0 has had a breather though, thanks to the popularity of higher resolutions with the higher-performance gamers. They're still not universal, but 1440p is solid, and 4k is making inroads.

Hopefully, we will see some PCIe 4.0 parts from Intel/AMD in the next couple years. The nVME and Optane devices are all incredibly starved on 3.0. We need something to plug this beast into:

http://www.tomshardware.com/news/silicon-motion-pcie-4.0-2018,34660.html
 
Last edited:
If you look at his chart around 6:30 on the video, it shows the fps on both 8x and 16x on a single GPU is the exact same (37.6 fps on both), which shows that the limits have not truly been reached in the classical sense.
Any generation of PCI-E will show limitations when using SLI without a crossbar (I know the Titan V GPUs don't support it), but it was like this back when 6800GT GPUs were all the rage with PCI-E 1.0 as well when a crossbar wasn't used, and the fps would dip quite a bit.

I don't want to say this is "cheating" to get to the PCI-E 3.0 limitation, but it is a bit of an artificial way of doing so on any GPUs (other than the Titan V GPUs since they legitimately don't have the capability).
When a single GPU shows a fps drop between 16x to 8x, then that will show when we have truly reached the limits of that generation of PCI-E, at least with gaming benchmarks, but not necessarily scientific/compute applications, though.

Nice video, though, it was very informative.
exactly what shows is that the game mGPU with multiadapter isnt the best
 
How did that video find that PCIe 3 is limiting?
  • It showed X8 being about 10% slower over X16 with both cards OC
  • X16 has 2x the bandwidth
  • If X8 can perform ~90% of X16 and X16 has 2x the bandwidth -> The Video shows the opposite - PCIe 3 x16 is not limiting in AOTs even with Titan V's with no bridge
  • Thumbs down on the conclusion
 
exactly what shows is that the game mGPU with multiadapter isnt the best
Do you mean the SLI crossbar?
Why do you say "it isn't the best"?

I'm not defending the crossbar or anything, but all it shows, and what has been known since NVIDIA's SLI inception, is that when using SLI without the crossbar causes that data traffic to be sent over the PCI-E lanes, thus causing more traffic/overhead on them than was originally planned.
Only very low-end GPUs and these Titan V GPUs don't utilize the SLI crossbar, and in the case of very low-end GPUs, there isn't enough traffic to bottleneck anything due to the GPUs being the limiting factor and not the PCI-E lanes.

With high-end GPUs, like the Titan V, the bottleneck becomes apparent due to the increase in data traffic.
Not sure what you mean by "it isn't the best", when this is just simply showcasing how the data is simply re-routed when the SLI crossbar is not present, and its effects when done so.
 
Do you mean the SLI crossbar?
Why do you say "it isn't the best"?

I'm not defending the crossbar or anything, but all it shows, and what has been known since NVIDIA's SLI inception, is that when using SLI without the crossbar causes that data traffic to be sent over the PCI-E lanes, thus causing more traffic/overhead on them than was originally planned.
Only very low-end GPUs and these Titan V GPUs don't utilize the SLI crossbar, and in the case of very low-end GPUs, there isn't enough traffic to bottleneck anything due to the GPUs being the limiting factor and not the PCI-E lanes.

With high-end GPUs, like the Titan V, the bottleneck becomes apparent due to the increase in data traffic.
Not sure what you mean by "it isn't the best", when this is just simply showcasing how the data is simply re-routed when the SLI crossbar is not present, and its effects when done so.
Because it is up to the developer to optimize it and this isnt technically the same as running AFR with the profile of AMD.Nvidia over pcie.

AMD has been running CF with PCIe bandwidth and no one has shown proof that it gets bottlenecked in many games, otherwise people would be complaining, like those with R9 Fury CF or 290x

also iirc that once SLI bridge bandwidth is almost used nvidia starts using PCIE BW for the communication(i think someone wrote that on THw?
 
Because it is up to the developer to optimize it and this isnt technically the same as running AFR with the profile of AMD.Nvidia over pcie.

AMD has been running CF with PCIe bandwidth and no one has shown proof that it gets bottlenecked in many games, otherwise people would be complaining, like those with R9 Fury CF or 290x

also iirc that once SLI bridge bandwidth is almost used nvidia starts using PCIE BW for the communication(i think someone wrote that on THw?
That's a very good point, and I see what you are saying now.
Gotcha!
 
Back
Top