MoBo/Chipset/CPU PCIe lanes - clarification is needed

Coolio · Jan 12, 2021

Hi guys, could you please help me to understand lanes? Here are the points I'm currently "struggling" with:

A "full-bandwidth" slot is the one which # of contacts (e.g. x8) is equal to the max # of lanes (i.e. 8) this slot can connect to. Is it right, that the # of lanes available to "full-bandwidth" slots is defined by MoBo and CPU (whichever has less lanes), while the number of lanes available to "secondary, non full-bandwidth" slots is defined by MoBo and Chipset (whichever has less lanes)? In other words, if CPU has 16 lanes, MoBo has 18 and Chipset has 20, finally there will be 16 lanes (theoretically) available to full-bandwidth slots and 18 lanes to non full-bandwidth slots - is that correct?
Why the logic mentioned in point #1 (if it's correct of course!) doesn't work in paired AMD X570 Chipset and AMD Ryzen 3000/Zen2 CPU? Ryzen 3000 has 24 PCIe lanes (4 to connect with MoBo and the rest 20 for full-bandwidth slots, e.g. x16 GPU and x4 NVME M.2 SSD), while X570 Chipset has 16 lanes - all available to non full-bandwidth slots. So, at the end of the day all 40 (24 + 16) lanes co-exist and work, which is somehow not aligned with the logic described in point #1 (Chipset should have decreased the # of CPU's lanes to 16, but it did not!).
How will you finish this phrase: "the sum of lanes needed for all the current and future devices to run at their max performance shall.... [be not less than... , be equal to... , etc.]"
What happens if the device gets less lanes than the max. # it supports? Does SSD start to write/read data slower, or does GPU start to generate less fps?
Is it correct, that different MoBos built on the same chipset may have different # of available lanes, because each MoBo manufacturer re-routes/hardwires lanes as they wish? Shall the lesser # of lanes result only in lesser # of slots, or also in slower port speeds?

Thank you!

lopoetve · Jan 12, 2021

1. Sorta - kinda. The CPU has X number of lanes available. For Zen 2/3, for instance, that's 24 lanes for YOUR devices (internal block diagram to things like the local SoC USB controller/etc are not yours to choose) - 4 are always to the chipset, 4 are always to the first NVMe slot, the remaining 16 are up to your motherboard manufacturer to decide what to do with them. Normally that's x16/off or x8/x8, but it depends on the manufacturer. Slot size and electric wiring also determines what a slot is capable of (hence while you'll see "physical slot" vs "electrical slot" or similar comments). While technically there IS a physical x4/x8 slot, they're almost always x16 physical slots wired as x8 (not sure I know of one wired at x4, but I'm sure it exists somewhere). Not sure I've ever seen a physical x8/x4 slot either - it's really either x1 or x16 physical.
2. Because the chipset->CPU link is only an x4 link in x570 - which means while you might have 16 lanes going to the chipset, only x4 worth of bandwidth can go to the CPU at the same time - so unless you're not going through the CPU for something (not even sure if DMA/etc could do that...), you're limited by that. Now, most things don't USE all the bandwidth at the same time, so... bottleneck? Maybe. Depends on what you're doing, how they wired it, etc.
3. Depends on the list of devices.
4. Depends on the device. For things like PCIE add-in M.2 cards - if it has a RAID controller on it or something similar, they may run slower. If it's pass through, some slots on the card may be unusable, or the card might not work at all (some expect the motherboard to do bifurcation for them). Also depends on if you have a PLX chip on the motherboard (rare now), bifurcation capabilities, etc. This is a somewhat loaded question. "It depends" is the real answer.
5. Different slots may have different lane capabilities, but the chipset/chip determine what is available in total - the manufacturer gets to decide how they wire those in. And this also (in some cases) depends on the CPU chosen - see x299 for instance, or x99, where different CPUs have different PCIE capabilities, and the motherboards handle them differently.

In GENERAL, on consumer kit - the first slot is x16. The second slot will function at x8 by stealing 8 lanes from the first slot. Anything hooked up to the chipset will be bottlenecked by a maximum x4 link back to the CPU.

As for graphics - x8 doesn't really change anything. X4 costs you a few FPS - I want to say 5-8% at most, but I haven't looked at that section of the benchmarks in a long time.

Note also: This is somewhat simplified without getting into actual block diagrams. Also doesn't count for HEDT except as noted. Or for futures.

Coolio · Jan 13, 2021

Hi lopoetve , thank you for such a detailed reply! That's slightly more complicated than I though it would be.

lopoetve said:
The CPU has X number of lanes available. ... 4 are always to the chipset, 4 are always to the first NVMe slot, the remaining 16 are up to your motherboard manufacturer to decide what to do with them.

So I assume it's an AMD Zen 2/3 logic that 4 lanes are dedicated to connect with the chipset - other CPUs may use different # of lanes for this task, right?

lopoetve said:
Normally that's x16/off or x8/x8, but it depends on the manufacturer. Slot size and electric wiring also determines what a slot is capable of (hence while you'll see "physical slot" vs "electrical slot" or similar comments).

So your sample numbers in reality could describe a "1 x16 physical slot" or "2 x8 physical slots" if those slots were full-bandwidth, however alternatively - those could as well be "2 x16 physical slots", but non full-bandwidth ones, as we have only 16 lanes, so each x16 slot will use only 8 lanes (and thus be physically - an x16 slot, but electrically - an x8 slot). Did I get the idea right? Sorry for rising a possibly noob topic, but I'm going to build my PC for the first time, so makes sense to go into those details.

lopoetve said:
2. Because the chipset->CPU link is only an x4 link in x570 - which means while you might have 16 lanes going to the chipset, only x4 worth of bandwidth can go to the CPU at the same time - so unless you're not going through the CPU for something (not even sure if DMA/etc could do that...), you're limited by that. Now, most things don't USE all the bandwidth at the same time, so... bottleneck? Maybe. Depends on what you're doing, how they wired it, etc.

Not sure I managed to see the bottleneck here... Pls correct me if I'm wrong, but what I understood from previous parts is that (if we're talking about Zen 2/3 as an example) 4 lanes are dedicated purely for chipset and can't be used by other devices. So if CPU has 24 lanes, 20 may be used by peripherals, and if these devices use all together 20 or less lanes, each device works at 100% of its tech spec capabilities. From what you say I understand that regardless of the # of lanes left after CPU-Chipset connection, capacity of these lanes is anyway utilized through the 4 CPU-Chipset lanes, which yeah - is definitely the bottleneck which limits the capacity of those available lanes, and also this turns upside down the "lane philosophy" the way I understood it. Finally, looks like it doesn't matter much how many lanes Chipset/MoBo/CPU have, what really matters - how many lanes connect CPU with the Chipset, as they are the sole "transport", huh?

lopoetve · Jan 13, 2021

Coolio said:
Hi lopoetve , thank you for such a detailed reply! That's slightly more complicated than I though it would be.

It used to be easier, especially when PLX chips (imagine a PCIE switch like a network switch) were more common. They're expensive now, and so less common.

Coolio said:
So I assume it's an AMD Zen 2/3 logic that 4 lanes are dedicated to connect with the chipset - other CPUs may use different # of lanes for this task, right?

Yes, although within a family it's generally set. And most motherboards are wired for a fixed number - eg: you'll never see an x570 with more than 4 lanes to the chipset, as there will never be (in theory) a Zen2/3/X chip that has more than x4 to the chipset, so why run the traces? The oddity here is with Z490, which is PCIE3 for Comet Lake, but ~theoretically~ possible for PCIE4 with Rocket Lake (although with the same number of lanes, just... faster). The oddballs here are the HEDT machines like x99/x299, where the chip SoC really determines how many lanes are available, and may not actually use everything that is electrically there (joy upon joys). https://us.msi.com/Motherboard/X99A-GODLIKE-GAMING/Specification makes a good example (click detail and go down to the slot breakout between different chips).

Coolio said:
So your sample numbers in reality could describe a "1 x16 physical slot" or "2 x8 physical slots" if those slots were full-bandwidth, however alternatively - those could as well be "2 x16 physical slots", but non full-bandwidth ones, as we have only 16 lanes, so each x16 slot will use only 8 lanes (and thus be physically - an x16 slot, but electrically - an x8 slot). Did I get the idea right? Sorry for rising a possibly noob topic, but I'm going to build my PC for the first time, so makes sense to go into those details.

Physical describes how long the slot is. https://www.tomshardware.com/reviews/pcie-definition,5754.html#:~:text=PCIe slots come in different,at one bit per cycle. gives a nice set of pictures -

From there, you can wire in different number of electrical traces as needed to make it function. Technically, you could have a physical x16 slot only electrically wired for x1, although I have absolutely no idea why you'd do such a bizarre thing. x16 and x8 are compatible electrically - if the slot is running in x8 mode, it will still work with a full x16 card.

In general, you will have x16 slots (video cards, some USB controllers, big RAID controllers, etc) and x1 slots. Don't worry about how they're electrically wired - for anything anyone sane is doing, it won't matter. If it's a big card, it goes into x16. If it's a small thing like a WIFI card, it goes into x1. The motherboard will handle it - and in most cases, you don't care if it goes through the chipset or the CPU (sata and NVMe ports are the one place this matters, as in general, using all the NVMe slots will disable some SATA ports).

Coolio said:
Not sure I managed to see the bottleneck here... Pls correct me if I'm wrong, but what I understood from previous parts is that (if we're talking about Zen 2/3 as an example) 4 lanes are dedicated purely for chipset and can't be used by other devices. So if CPU has 24 lanes, 20 may be used by peripherals, and if these devices use all together 20 or less lanes, each device works at 100% of its tech spec capabilities. From what you say I understand that regardless of the # of lanes left after CPU-Chipset connection, capacity of these lanes is anyway utilized through the 4 CPU-Chipset lanes, which yeah - is definitely the bottleneck which limits the capacity of those available lanes, and also this turns upside down the "lane philosophy" the way I understood it. Finally, looks like it doesn't matter much how many lanes Chipset/MoBo/CPU have, what really matters - how many lanes connect CPU with the Chipset, as they are the sole "transport", huh?

The Chipset is basically a secondary CPU - it has 12 lanes going to things, with 4 going back to the CPU. If you are trying to transfer all 12 lanes worth of bandwidth, you'll be bottlenecked by the x4 link back to the CPU for processing. Much like a single 1G uplink out of a switch, if everything on that switch is trying to upload/download at 1G also, they can't exceed that 1G link to whatever is above the switch.

Everything has to pass through the CPU (in general, also simplified massively). So if you have 2-3 NVMe drives on the chipset, which is possible in some systems, you can only do x4 at a time back to the CPU, even though the chipset has x12 dedicated to storage.

It all comes down to what things are attached to. If you have a dedicated link to the CPU (eg: your first PCIE slot), then you have no bottleneck - nothing is ever using that lane but the one device. If you're going to the chipset, you're SHARING a x4 link to the CPU with everything else on the chipset. That's why some of us study the block diagrams - for some of our use cases (mine tend to be out there), it matters. For most consumers, it doesn't. I didn't pay the slightest bit of attention on my Z490 gaming system - don't care at ALL what shares the chipset, as a game isn't going to be touching ALL the NVMe drives at the same time. But my Threadripper workstation and 3950/x570 VM system - I definitely checked, to make sure that the RAID cards weren't going to be doing something silly.

Coolio · Jan 13, 2021

Dude, you're amazing! Below is the best & the simplest explanation of how this all works.

lopoetve said:
It all comes down to what things are attached to. If you have a dedicated link to the CPU (eg: your first PCIE slot), then you have no bottleneck - nothing is ever using that lane but the one device.

Let's take a real-life example which for 99% will be my case: 1 x16 GPU, 1 x1 WiFi and 1 x4 NVME M.2 SSD. In ideal scenario I need a MoBo with dedicated (full-bandwidth) x16 and x4 slots, as I want these 2 devices to run fastest possible. Any X570-based MoBo will work fine: those 2 slots have direct link to the CPU. The WiFi card will work via the Chipset using 1 lane, so 3 lanes will be "free". Questions:

I will need USB 10Gb (USB 3.1 Gen 2) lanes too: to exchange data with external drives and the mobile phone (if I manage to find the PC case with a built-in Type-C connector). To which extent do USB ports use the 3 lanes that are left: 1 port = 1 lane?
By the way, I've heard that the only consumer devices (so far) supporting USB 3.1 Gen 2 10Gb speed are NVMe SSD drives - is that correct? Does that finally mean I won't be able to copy files to/from my mobile phone's SD card (or internal memory) at a 10Gb speed even using a fast-speed Type-C cable?
My second M.2 slot (if the MoBo has 2 of them of course) as opposed to the 1st one - won't have direct link to the CPU, but I want to add the 2nd SSD and I want it to work fastest possible too. Shall there be any difference in SSD performance if I plug it to the 2nd M.2 slot vs. SATA Express slot? Shall the 3 left lanes (I'm not considering USB here for simplicity) be utilized somehow differently in those 2 scenarios? [YES, I rememer you've said 2nd M.2 may disable certain SATA(s)] If there's no difference, I guess makes no sence paying extra for the NVME version and I can go for a cheaper 2.5 form-factor having the same result in performance, right?

And 2 short noob

questions if you don't mind:

how does the "4 CPU lane" sharing look like from task processing perspective? If I have 1 SSD and 1 GPU sharing lanes (not within the config I've described above, but in some hardcore version) - does that mean both devices perform tasks at a regular performance, but work in turns (like "pseudo-multitasking" in old single-core CPUs) or do they drop down performance, but work simultaneously?
What does the "1-way mode, 2-way mode, etc." in the Slots section of the X99A MoBo mean? Google says this somehow relates to SLI (definitely not my case), but I'm not sure...

Thank you a lot for your help!

lopoetve · Jan 13, 2021

Coolio said:
Dude, you're amazing! Below is the best & the simplest explanation of how this all works.

Let's take a real-life example which for 99% will be my case: 1 x16 GPU, 1 x1 WiFi and 1 x4 NVME M.2 SSD. In ideal scenario I need a MoBo with dedicated (full-bandwidth) x16 and x4 slots, as I want these 2 devices to run fastest possible. Any X570-based MoBo will work fine: those 2 slots have direct link to the CPU. The WiFi card will work via the Chipset using 1 lane, so 3 lanes will be "free". Questions:

I will need USB 10Gb (USB 3.1 Gen 2) lanes too: to exchange data with external drives and the mobile phone (if I manage to find the PC case with a built-in Type-C connector). To which extent do USB ports use the 3 lanes that are left: 1 port = 1 lane?

No, those are handled by the SoC (there are more PCIE lanes, you just don't see them - they're internal to things like the USB controller, sound controller, etc that all motherboards are required to have. Since you can't plug anything into them (no choices), we don't really talk about them.

Coolio said:
By the way, I've heard that the only consumer devices (so far) supporting USB 3.1 Gen 2 10Gb speed are NVMe SSD drives - is that correct? Does that finally mean I won't be able to copy files to/from my mobile phone's SD card (or internal memory) at a 10Gb speed even using a fast-speed Type-C cable?

Correct; most things can't saturate a 10Gb link - that's ~900MB/s, which exceeds, for instance, the SATA spec. SD cards come rated at different speeds, but won't be THAT fast. Still faster than the older USB ports though.

Coolio said:
My second M.2 slot (if the MoBo has 2 of them of course) as opposed to the 1st one - won't have direct link to the CPU, but I want to add the 2nd SSD and I want it to work fastest possible too. Shall there be any difference in SSD performance if I plug it to the 2nd M.2 slot vs. SATA Express slot? Shall the 3 left lanes (I'm not considering USB here for simplicity) be utilized somehow differently in those 2 scenarios? [YES, I rememer you've said 2nd M.2 may disable certain SATA(s)] If there's no difference, I guess makes no sence paying extra for the NVME version and I can go for a cheaper 2.5 form-factor having the same result in performance, right?

SATA ports are always slower than NVMe - even sharing bandwidth over the chipset link (there's no way to saturate the chipset link, for instance, with a WiFi adapter - that's why most of those built-in feed off the chipset). Remember, x4 PCIE4 is almost 8GB/s. NAND can't even keep up with that yet - you're not likely to bottle neck, it's just ~possible~ (especially if you run a LOT of NVMe drives off of the chipset) - see, for instance, the x570 Meg Creation board. SATA 6Gb/s is ~550MB/s. NVMe will be close to 4GB/s on average.

Coolio said:
And 2 short noob questions if you don't mind:

how does the "4 CPU lane" sharing look like from task processing perspective? If I have 1 SSD and 1 GPU sharing lanes (not within the config I've described above, but in some hardcore version) - does that mean both devices perform tasks at a regular performance, but work in turns (like "pseudo-multitasking" in old single-core CPUs) or do they drop down performance, but work simultaneously?

What does the "1-way mode, 2-way mode, etc." in the Slots section of the X99A MoBo mean? Google says this somehow relates to SLI (definitely not my case), but I'm not sure...

Thank you a lot for your help!

1. Either way won't make much of a difference - but it's drop down. Switching in this state would be on the order of nanoseconds. You won't notice - end result would be the same either way

2. How many cards you have plugged in. Since normally the concern is graphics cards and SLI, you see it described thus - but if you plug a card into the second slot, it'll set the first to x8 and the second to x8, rather than x16/off. Pretty normal. Otherwise, it'd be locked to x8/x8 (cheap boards tend to do this). Again, PCIE4, you're not going to notice.

Short version - buy NVMe if you want speed, SATA if you want cheap, don't worry that much about plugging them in, although with current costs I'm a fan of just snagging NVMe drives unless you're doing something like big 4TB SATA SSDs. I run piles of VMs doing NVME over Fabric and other weird things, so I have to care - but for anything ~normal~, you just don't. x570 has enough bandwidth for any normal home/workstation use. We weirdos do threadripper, because I wanted up to 8 NVMe drives (and my other HEDT machines have multiple RAID controllers).

Glad I could help!

Coolio · Jan 14, 2021

lopoetve Thank you SO much! Best explanations one could dream of.

plugwash · Jan 18, 2021

lopoetve said:
From there, you can wire in different number of electrical traces as needed to make it function. Technically, you could have a physical x16 slot only electrically wired for x1, although I have absolutely no idea why you'd do such a bizarre thing.

Cryptocurrency mining.

Generally needs very little bandwidth to/from the GPUs, but the cards still have physical x16 connectors on them and want the power associated with an x16 slot. There seems to be a whole cottage industry of adapters for cryptocurrency mining, often based on abusing USB3 connectors to carry PCIe x1.

lopoetve · Jan 19, 2021

plugwash said:
Cryptocurrency mining.

Generally needs very little bandwidth to/from the GPUs, but the cards still have physical x16 connectors on them and want the power associated with an x16 slot. There seems to be a whole cottage industry of adapters for cryptocurrency mining, often based on abusing USB3 connectors to carry PCIe x1.

Yeah, I was leaving that particular use case out and trying to pretend it doesn't exist

They make motherboards for that too - and riser cables, and other weird stuff. Stupid mining. That's a WEIRD niche use case though for what the OP was asking about

man114 · Jan 31, 2021

plugwash said:
Cryptocurrency mining.

Generally needs very little bandwidth to/from the GPUs, but the cards still have physical x16 connectors on them and want the power associated with an x16 slot. There seems to be a whole cottage industry of adapters for cryptocurrency mining, often based on abusing USB3 connectors to carry PCIe x1.

Gigabyte‘s x399 designare and Aorus gaming 7 boards have a 16x slot wired as a 1x. The spacing is kinda weird but It can be used in some configs. In mine I have graphics, fusion io, nu audio, fusion io, fusion io from the top slot down. The nu audio only needing the 1x and fitting between the other cards.

MoBo/Chipset/CPU PCIe lanes - clarification is needed

Coolio

Weaksauce

lopoetve

Extremely [H]

Coolio

Weaksauce

lopoetve

Extremely [H]

Coolio

Weaksauce

lopoetve

Extremely [H]

Coolio

Weaksauce

plugwash

[H]ard|Gawd

lopoetve

Extremely [H]

man114

Weaksauce