Questions about PCIe lanes & bandwidth for upcoming build

elecsys · Aug 26, 2018

Hello,
I'm looking for some information on the difference between the PCIe lanes on the PCH vs. the one's that are provided by the CPU, as i have concerns about potential i/o & bandwidth limitations with an upcoming build.

To give a practical example:
I'm planning to run no more than one GPU (GTX 1080 for now), but i would like to reserve the total 16 PCIe 3.0 lanes provided by the CPU for a future graphics card upgrade. My thinking is that i might run into a bottleneck with next-gen GPUs (the one's after the recently announced RTX 20 series) if i reserve only 8 lanes.
Adding to that, let's say i want to run:
one x4 PCIe Optane SSD (4 lanes), three x4 M.2 PCIe SSD drives (12 lanes), and two SATA drives. No raid planned.

I found a lot of seemingly conflicting information on the web, so I'd be happy if someone could clear up my misconceptions.

Let's take the upcoming Z390 for example: x16 PCIe 3.0 lanes provided by the CPU, x24 PCIe 3.0 lanes provided by the chipset.

Does this mean i have 40 PCIe 3.0 lanes available in total, to be used simultaneously with full bandwidth, or do all these additional lanes ultimately need to go through the 16 lanes provided by the CPU, meaning that the bandwidth is effectively limited to the equivalent of 16 lanes simultaneously?

I know that the PCH can only assign it's lanes in chunks of x4, but can all these 24 lanes (or 6 * x4 lanes) be freely used up for add-in cards, or are some of these lanes already reserved for other things like raid, usb, network,...? If it's the latter, then how many of those 24 lanes can i effectively use without peeling those lanes from other parts of the board?

I have read elsewhere that the connection from the PCH to the CPU is a x4 PCI-E 3.0 DMI link on the X299 platform. I assume it's the same on the Z370 or the upoming Z390 platform? How exactly does this DMI link work? Does this link take up 4 of the 16 CPU lanes? Do all those 24 chipset lanes of our example share the same x4 lane link, like a bottleneck?

BlueLineSwinger · Aug 26, 2018

There's a separate DMI link (also essentially PCIe) between the CPU and chipset. The 16 PCIe lanes from the CPU and 24 from the chipset are not taken up by the DMI link.

Perhaps the Z390 block diagram (the Z370 is almost identical) will make it clearer (spoiled for size):

The 16 PCIe lanes will always be available from the CPU. In most every mainstream mainboard I've ever seen these are linked to the first two 16x PCIe slots (BIOS/UEFI auto-configures between 16 or 8+8 as needed). Of the PCIe lanes from the chipset, the full 24 may or may not be available depending on how the manufacturer configures things and features enabled.

extide · Aug 26, 2018

Essentially the 16 direct from the CPU are full bandwidth -- direct. The 24 on the chipset all share a single DMI link which is basically the same speed as a PCIe 3.0 x4 link.

elecsys · Aug 27, 2018

Thanks for the block diagram!
I seem to have a more fundamental lack of understanding as to how PCIe connections work...
Do all those chipset PCIe lanes work "independent" of the CPU or do they need to adress it / be adressed by it at all times?
And if the latter is the case: since the chipset is connected to the CPU by only an x4 DMI link, does that mean those "24 lanes" are effectively limited to the bandwidth eqivalent of 4 lanes?

What i don't get is, that both today and way in the past motherboards sometimes had up to 4 full x16 PCIe slots, which is basically just PCIe sharing of the same 16 lanes provided by the CPU, yet they never were advertized as providing "64 lanes". As one can see in the diagram it's either one times x16, two times x8, or x8 + two times x4.
So does the same apply to the chipset's "24 lanes"? Is it just PCIe sharing of 4 lanes?

TheFlayedMan · Aug 27, 2018

elecsys said:
Thanks for the block diagram!
Do all those chipset PCIe lanes work "independent" of the CPU or do they need to adress it / be adressed by it at all times?

Not really sure what you mean by this.

elecsys said:
And if the latter is the case: since the chipset is connected to the CPU by only an x4 DMI link, does that mean those "24 lanes" are effectively limited to the bandwidth eqivalent of 4 lanes?

Yes. The bandwidth of the DMI link is 3.93 GB/s so it is unlikely to be much of a bottleneck.

elecsys said:
What i don't get is, that both today and way in the past motherboards sometimes had up to 4 full x16 PCIe slots, which is basically just PCIe sharing of the same 16 lanes provided by the CPU, yet they never were advertized as providing "64 lanes". As one can see in the diagram it's either one times x16, two times x8, or x8 + two times x4.
So does the same apply to the chipset's "24 lanes"? Is it just PCIe sharing of 4 lanes?

The number of lanes for the chipset governs how many devices can be connected to it. All of the devices communicate with the chipset which interfaces with the CPU using 4 lanes at 3.93 GB/s.

defaultluser · Aug 27, 2018

E4g1e said:
Correction about the 3.93 GB/s:

That spec was for the previous version of DMI, DMI 2.0. The Z390 (and the Z370 before it) will use DMI 3.0, which doubles the bandwidth of the chipset connection. This means a throughput of 7.87 GB/s for the chipset connection, equivalent to eight PCIe 3.0 lanes.

Nope, sorry you had it right the first time.

https://en.wikipedia.org/wiki/Direct_Media_Interface

DMI 3.0, released in August 2015, allows the 8 GT/s transfer rate per lane, for a total of four lanes and 3.93 GB/s for the CPU–PCH link.

It's 8Gbps per-lane x 4 lanes = 32Gbps. 32Gbps / 8 bits/byte = 4 GBs per-second. 3.93 after interface overhead.

elecsys: the PCIe bandwidth is set up like this:

CPU has a direct connection to the outside world through 20 PCIe 3.0 lanes. Four of those are reserved for talking to the chipset, and the other Sixteen are reserved for expansion cards.

The chipset has way more PCIe slots (24) than upstream bandwidth, and it's designed to handle that bandwidth on all those ports simultaneously. The designers assume you will not be maxing-out all your devices at-once, or that a large portion of your bandwidth will be copying data from one SSD to another (can all be handled on the chipset).

Having a large number of PCIe lanes on the chipset means you can offer motherboards with tons of different expansion options (each device needs it's own lane), and because you are unlikely to constantly stress-out all those devices, you are unlikely to need more than that DMI 3.0 link between the processor and chipset. So just think of DMI and the chipset as a port multiplier.

The other 16 lanes are directly connected to the CPU, and they don't have to share bandwidth with anything. You shouldn't put a gaming graphics card on any x16 slots that are shared with the chipset, as it will be stressing out the chipset's entire bandwidth (and it will come up short in performance versus an 8x or x16 slot).

E4g1e · Aug 27, 2018

defaultluser said:
Nope, sorry you had it right the first time.

https://en.wikipedia.org/wiki/Direct_Media_Interface

It's 8Gbps per-lane x 4 lanes = 32Gbps. 32Gbps / 8 bits/byte = 4 GBs per-second. 3.93 after interface overhead.

Oh, I was confused. DMI 2.0 had a bandwidth of only 5.0 Gbps (4.0 Gbps after overhead compensation) per lane. Maybe I incorrectly transposed those figures. Or I had confused "GT/s" with "GB/s".

Well, in that case, then, what the OP is trying to do is impossible on any mainstream CPU platform. Trying to run a GPU off of the PCH will result in the GPU getting less than one-fourth of the bandwidth that it's rated for, after accounting for the utilization of all of the other devices that utilize the PCH. Therefore, there is absolutely no solution for this dilemma.

TheFlayedMan · Aug 27, 2018

I didn't really understand what the OP wanted to do, plug in a graphics card into the wrong slot on the motherboard in order to save the x16 slot for a graphics card that he might buy in the future?

extide · Aug 27, 2018

elecsys said:
What i don't get is, that both today and way in the past motherboards sometimes had up to 4 full x16 PCIe slots, which is basically just PCIe sharing of the same 16 lanes provided by the CPU, yet they never were advertized as providing "64 lanes". As one can see in the diagram it's either one times x16, two times x8, or x8 + two times x4.
So does the same apply to the chipset's "24 lanes"? Is it just PCIe sharing of 4 lanes?

They were either on a different platform with more lanes available (x58, x79, x99, etc) or used PCIe switches which essentially can make multiple slots share lanes, like 2 x16 slots can use just 16 lanes from the CPU, but they still have to share those 16 lanes. Also in many cases those 16x slots will not all be physically wired as 16x, they may be only x8 or even drop down to x4 depending on the exact board and configuration.

elecsys · Aug 27, 2018

Thanks guys, that answers most of my questions. But is there any realistic scenario where I would max out this x4 DMI link?
Also, I've read in another thread that going through the DMI interface might result in increased latency. So would there be a disadvantage in having the OS drive be connected via the chipset lanes / DMI link, instead of CPU lanes?
https://hardforum.com/threads/nvme-ssd-faster-in-pcie-slot-adapter-than-onboard-m-2.1942578/

TheFlayedMan:
No, I do want to use the whole 16 CPU lanes for the GPU right now. I have a 1080, so while reserving just 8 lanes would likely be enough bandwidth for now, i was worried that a GPU two years from now might already be bottlenecked by an x8 connection.
But if i use up all 16 CPU lanes, would the chipset lanes provide enough bandwidth to feed all my other planned devices, is what i wasn't sure about. Or the exact relationship between CPU & PCIe lanes.

Deleted member 88227 · Aug 27, 2018

Unless you plan on using multiple GPUs at once its not going to be an issue. You'll never have a scenario where you're maxing throughput on storage and maxing out the GPU. Unless you're doing some hardcore crunching that requires a lot of writing/reading to storage.

Just buy a threadripper.

BlueLineSwinger · Aug 27, 2018

elecsys said:
Thanks guys, that answers most of my questions. But is there any realistic scenario where I would max out this x4 DMI link?
Also, I've read in another thread that going through the DMI interface might result in increased latency. So would there be a disadvantage in having the OS drive be connected via the chipset lanes / DMI link, instead of CPU lanes?
https://hardforum.com/threads/nvme-ssd-faster-in-pcie-slot-adapter-than-onboard-m-2.1942578/

You're way overthinking all this.

All the devices hanging off the chipset and DMI are typically low-bandwidth and/or bursty/intermittent. For normal desktop usage it's highly unlikely you'll ever saturate that link. Any latency that might be introduced would be minuscule and imperceptible.

elecsys said:
TheFlayedMan:
No, I do want to use the whole 16 CPU lanes for the GPU right now. I have a 1080, so while reserving just 8 lanes would likely be enough bandwidth for now, i was worried that a GPU two years from now might already be bottlenecked by an x8 connection.
But if i use up all 16 CPU lanes, would the chipset lanes provide enough bandwidth to feed all my other planned devices, is what i wasn't sure about. Or the exact relationship between CPU & PCIe lanes.

All the PCIe lanes coming off the CPU, whether for the GPU or for DMI to the chipset, have the full bandwidth available to them at all times. So regardless of how you have the GPU(s) set up (single GPU with 16 or dual with 8+8), the DMI will still have the full bandwidth to the chipset. There's no reason to give a single GPU only 8 lanes.

thecold · Aug 27, 2018

What is the use case?

elecsys · Aug 28, 2018

BlueLineSwinger said:
You're way overthinking all this.

Fair enough, I probably am!

BlueLineSwinger said:
There's no reason to give a single GPU only 8 lanes.

It wasn't just my ignorance about the DMI link - initially i thought about going 8 lanes for the GPU, so I could use the remaining lanes for stuff like SSD PCIe Cards or other extension cards. I was just worried that once i upgrade my GPU some 2 years from now, I might actually need the full x16 for the GPU alone.
So if i instead connect all the other planned devices via the chipset/DMI, would they have enough bandwidth? That's another thing I wasn't sure about. But I got it now, thanks to do guys!

thecold said:
What is the use case?

Nothing special, just normal home use. Mostly gaming, some CAD, and use as a home media server. In-home gamestreaming over nvidia shield, which is why i was worried about anything that might add latency.

defaultluser · Aug 28, 2018

We're talking about 200 extra microseconds latency, and the drive connected over DMI still posts benchmarks scores faster than any SATA6 drive. You won't have any NOTICEABLE effects from this in the real-world, unless you're running a server with a giant database.

These NVMe drives are just targeted at Workstation users, but high-end gamers have bought into the hype hook-line-and-sinker. And of course they run pointless benchmarks on these drives (because bragging about your benchmark results is part of the fun of buying hardware you don't really need).

Questions about PCIe lanes & bandwidth for upcoming build

elecsys

n00b

BlueLineSwinger

[H]ard|Gawd

extide

2[H]4U

elecsys

n00b

TheFlayedMan

Limp Gawd

defaultluser

[H]F Junkie

E4g1e

Supreme [H]ardness

TheFlayedMan

Limp Gawd

extide

2[H]4U

elecsys

n00b

Deleted member 88227

Guest

BlueLineSwinger

[H]ard|Gawd

thecold

[H]ard|Gawd

elecsys

n00b

defaultluser

[H]F Junkie