PCIE Bifurcation

inaxeon

n00b
Joined
May 28, 2021
Messages
25
One last shot for the road. Installed in the case.

Well this has been an interesting journey. I've now got M.2 SSD and 10GbE - a previously impossible combination with this case / mobo. I actually can't remember the last time I designed a PCB that didn't have to be revised. These were made blue because I expected they'd be a prototype, and the final was going to be black. So much for that.

A big thanks to C_Payne for the tips and many renders of his designs which inspired mine ;-)

installed.jpg
 

C_Payne

Limp Gawd
Joined
Jan 6, 2017
Messages
170
Impressive work with the cable. I would have lost all patience soldering it. Looks pretty decent quality too, one can see some experience there.
 

Okatis

Weaksauce
Joined
Jan 16, 2014
Messages
126
Are bifurcation options limited to ITX boards or have there been micro ATX (mATX) boards known to support it, too? Read here that while some series support splitting it's not necessarily exposed in the UEFI settings so I'm not sure whether there's a way of checking without someone's first-hand experience.

Was planning a potential build around AMD and trying to find a way of splitting two x16 slots (available in some motherboards) into x8/x8, for a total of four x8 usable slots. However it depends how feasible this is.

Also, on adapters: apart from adding ribbon risers are there any workarounds for the hard right-angle offset of the bifurcation adapters? I understand they're mostly used for SFF cases with vertical GPU brackets but for cases without such an arrangement it could pose challenges to where the hardware could be positioned/mounted. It also seemed from reading smallformfactor.net that very few ribbon/flexible risers are said to be high quality (3M appears to be the most reliable?), although this aspect might be overblown, I'm not familiar enough with it.
 

C_Payne

Limp Gawd
Joined
Jan 6, 2017
Messages
170
Note that on Ryzen two x16 slots.means that they are already split into x8.
can't do x8x8 then of course, only x4x4.

Bifurcation cards and risers in combo are good for gen3. gen4 poses significant challenges regarding signal integrity. Careful consideration neccesary when planning a gen4 build.
 

Okatis

Weaksauce
Joined
Jan 16, 2014
Messages
126
Note that on Ryzen two x16 slots.means that they are already split into x8.
can't do x8x8 then of course, only x4x4.
Hmm, bit confused. As when looking earlier I saw this post showing the UEFI settings for a B550M Pro4 (board has one PCIe v4.0 (x16) slot + another PCIe v3.0 (x16) slot) which showed x16, x8x8, x8x4x4 and x4x4x4x4 options, while this post from 2018 mentioned that the X370/X470 series has x8x8 bifurcation.

Maybe I'm just misunderstanding how it works.

Bifurcation cards and risers in combo are good for gen3. gen4 poses significant challenges regarding signal integrity. Careful consideration neccesary when planning a gen4 build.
Would splitting PCIe v4.0 be an issue if I weren't to use it for a GPU? As I'm only really needing it for a network card and two drive controllers (all of which use x8, two of the cards being PCIe 2.0 and one PCIe 3.0). If any GPU were used it'd only be some older card for occasional video out.
 

C_Payne

Limp Gawd
Joined
Jan 6, 2017
Messages
170
When using gen3/2 devices you are good.

Ryzen has 24 lanes. (4 chipset, 4 M.2, 16 for slot1/2)
When you use only slot 1 you can use x16, x8x8, x8x4x4 or x4x4x4x4 in slot 1.
When you use slot 1 and 2 you can use x8 or x4x4 in both of them.
 

Okatis

Weaksauce
Joined
Jan 16, 2014
Messages
126
When using gen3/2 devices you are good.

Ryzen has 24 lanes. (4 chipset, 4 M.2, 16 for slot1/2)
When you use slot 1 and 2 you can use x8 or x4x4 in both of them.
I see now. That's a pity, will have to rethink this. Found a post suggesting with a Threadripper CPU (with its 64 lanes) and a compatible motherboard it should be possible with four slots at that bandwidth, although they're much pricier.
 
Last edited:

inaxeon

n00b
Joined
May 28, 2021
Messages
25
The DHL van has been and gone, and it's new riser day again today. This is my first physical build of the other riser I showed on the previous page. The Mezzanine PCB is shown in position but isn't electrically attached yet.

I'm hoping this one will be a slam dunk. My previous riser's schematic and footprints were derived from this, it's also 4-layer so SE impedance will be closer to spec. Should just work right? ;-)


t9120a.jpg

t9120b.jpg

t9120c.jpg
 

inaxeon

n00b
Joined
May 28, 2021
Messages
25
Off to a good start. x8 and M.2 working good. Booting from my sacrificial sabrent, with a SAS 2308 in the X8 slot.

incase.jpgwindows.png

Now we get to the tricky bit, running an x4 from the left hand 3.5" bay. Going to be a bit of a job to build all of that. Watch this space...
 
Last edited:

inaxeon

n00b
Joined
May 28, 2021
Messages
25
And there we go. Finally...

bc1.jpgbc2.pg.jpgsc3.jpg

Works brilliant! Two PCIe cards in a case designed for one. Quite a bit of metalwork in all of that, but it all worked out well in the end. Mezzanine PCB has a 0.5mm offset error (not too surprising as I hadn't decided on the exact position of the card).

As I said previously, I have spares of these, so could make them available if anyone else wanted them, although the metalwork here is a one-off.
 
Last edited:

inaxeon

n00b
Joined
May 28, 2021
Messages
25
New board coming soon. Similar to C_Payne's larger and better designed slimline adapters, but for non GPU use (apologies if anyone is still reading, and getting bored of these posts ;-)

It's a compact OCULink x4 to PCIe x8 (mechanical) adapter. I don't have all the 3D models so some imagination required. I actually don't have any specific use for it right now, but designing PCIe adapters is quite addictive and my ASRock Rack boards are bristling with OCULink ports crying out to have something connected to them.

oculinkadapter.png
 

inaxeon

n00b
Joined
May 28, 2021
Messages
25
It's built! Soldering down the OCuLink connector was a bit more trouble than I was expecting but figured it out eventually. Smashed it in the benchmark, full Gen.3 performance achieved (don't have anything newer, but doubt it'd work anyway).

Using an OCuLink to PCIe adapter is a pretty interesting alternative to bifurcation, making me a bit off topic here I admit. I'm quite surprised how few products there are out there which do this, given that there are quite a lot of boards out there with OCuLink/U.2 connectors fitted.

assembled.jpg


fullsetup.jpgocuperf.pngwindowsocu.png
 
Last edited:

inaxeon

n00b
Joined
May 28, 2021
Messages
25
One more test, doubled the cable length to 75cm, and replaced the active carrier with passive, just in case I was hanging on by a thread.

Still got full gen 3 speeds ;-)

passivetest.jpg
 
Joined
Feb 3, 2021
Messages
2
Hi,

Anyone with a ncase splitter that can show with images how it is placed?

I do not see visually how it would look or how I would have to put buying one for one gpu 2 slots and capture card.

thanks
 

Icecold

Limp Gawd
Joined
Jul 21, 2013
Messages
213
My apologies if I'm asking something that was covered previously. I've skimmed most of the thread but it is 28 pages so I may have missed it. If I were looking to run 8 GPU's at PCIe x8 what are my options? Preferably leaning towards the less expensive side. It seems like an X399 motherboard with something like a Threadripper 1900x? Maybe a low end EPYC setup? A lot of information I've found online is pretty old but it seems like a lot of boards only allow x4/x4/x4/x4 bifurcation instead of splitting an x16 slot into 2 x8 slots.

I would be using it to run BOINC projects and Folding@home, and a PCIe x1 or x4 slot can bottleneck some of those projects. I appreciate any input anybody has! I would probably be using this card from C_Payne https://c-payne.com/products/pcie-bifurcation-card-x8x8-3w
 

johnny0

n00b
Joined
Jan 8, 2017
Messages
35
You'll be limited to six 8-lane gen 3 slots on x399 using bifurcation. Beyond that, you're in EPYC territory.
 

inaxeon

n00b
Joined
May 28, 2021
Messages
25
My apologies if I'm asking something that was covered previously. I've skimmed most of the thread but it is 28 pages so I may have missed it. If I were looking to run 8 GPU's at PCIe x8 what are my options? Preferably leaning towards the less expensive side. It seems like an X399 motherboard with something like a Threadripper 1900x? Maybe a low end EPYC setup? A lot of information I've found online is pretty old but it seems like a lot of boards only allow x4/x4/x4/x4 bifurcation instead of splitting an x16 slot into 2 x8 slots.

I would be using it to run BOINC projects and Folding@home, and a PCIe x1 or x4 slot can bottleneck some of those projects. I appreciate any input anybody has! I would probably be using this card from C_Payne https://c-payne.com/products/pcie-bifurcation-card-x8x8-3w
I would suggest something like this: https://www.asrockrack.com/general/productdetail.asp?Model=ROMED4ID-2T#Specifications - actually there isn't anything else like it quite frankly.

Then use C_Payne's slimline cards for the GPUs https://c-payne.com/products/slimsas-pcie-device-adapter-2-8i-to-x16 (assuming the clock comes off the connector serving lanes 1-8)

Basically a scaled up version of what I just demonstrated. It would be built like a mining rig, GPUs all cabled off the motherboard.
 
Last edited:

johnny0

n00b
Joined
Jan 8, 2017
Messages
35
Preferably leaning towards the less expensive side.
I don't think you're going to see this side no matter which path you take, but just how far away you'll get depends on a few things.

I think the main questions you need to think about and answer are:
  1. What's my budget?
  2. When will I be bottlenecked by pcie gen3?
  3. Will I go beyond eight GPUs?
  4. Will I use the CPU for more than just orchestrating the GPUs? With what kind of workloads?

Looking at the ROMED4ID-2T that inaxeon linked above, I see a few potential issues already:
  • It doesn't support 1st gen epyc. There's a massive gulf in price between 1st gen and newer chips.
  • Two of the six slimline connectors are low-profile. Like the linked article, I couldn't find a proper cable for these connectors.
  • There is no room for expansion w.r.t. 8-lane slots; eight is your limit on this board.
    If two of the slimline ports can't be used reliably for pcie due to cabling, then the board fails your basic requirements.
 

johnny0

n00b
Joined
Jan 8, 2017
Messages
35
For a minimal open-air system based on the Asrock ROMED8-2T with eight fat GPUs, I get about $191 per x8 slot for gen3, $273/slot for gen4.

The board slot configuration would look like this:
Code:
Slot  Device
----------------
1     GPU1
2     Empty (Blocked by GPU1)
3     Empty (Blocked by GPU1)
4     Bifurcation Host Adapter 1 -> GPU2, GPU3
5     Bifurcation Host Adapter 2 -> GPU4, GPU5
6     Bifurcation Host Adapter 3 -> GPU6, GPU7
7     GPU8

You can slowly expand up to 14 x8 slots on this board. Fully bifurcated, I get about $149/slot for gen3, $237/slot for gen4.

Gen3 is bifurcated using MaxCloudOn Kits.
Gen4 is bifurcated using C_Payne's Slimline host/device adapters and recommended 3M cable.


Powering this many GPUs/risers at once is an exercise left to the reader (and completely unaccounted for in the above figures).
 

Icecold

Limp Gawd
Joined
Jul 21, 2013
Messages
213
I appreciate the information from both of you. I may just stick to 4-5 GPU's per machine which would allow me to stick with less expensive hardware, or may look into something dual socket. I think a dual socket Haswell Xeon would have plenty of PCIe lanes, but I would need to do some research to find a board that supports bifurcation. To answer the questions, though-

  1. What's my budget? -Ultimately, I guess unlimited, but I'd prefer to spend as little as possible without bottlenecking the GPU's. I would favor an older platform that uses more power over a more efficient platform that's way more expensive. I'm looking to build at least a couple of these, so EPYC Rome hardware might exceed what I want to spend. Was hoping a Naples EPYC or 1st Gen Threadripper could be a good option, or something like a Haswell Xeon.
  2. When will I be bottlenecked by pcie gen3? -I don't think I will, ever. Even x4 PCIe gen 3 is probably sufficient for what I'm running, but I was wanting to stick to x8 just to make sure I don't cause any bottlenecks
  3. Will I go beyond eight GPUs? -I will, but in additional machines. 8 seemed like a solid choice per machine, but maybe 6 or something would make more sense
  4. Will I use the CPU for more than just orchestrating the GPUs? With what kind of workloads? -It will not, but the BOINC GPU projects I run do require a fairly decent processor core per GPU to run without bottlenecking. Anything pre Haswell is likely to bottleneck the GPU. I have other machines without GPU's for running anything that is CPU intensive.
 

johnny0

n00b
Joined
Jan 8, 2017
Messages
35
Wow, that's a non-trivial amount of CPU load. You really should test for bottlenecks with like hardware before going any further.

For Zen, adding additional gpus via bifurcation means you may end up with more than one worker per CCX. The specific slots you bifurcate may matter in this regard. You'll also want to investigate cross-CCX latency. You may find out your workers can eat cross-CCX penalties just fine, or that you need to budget for a DIMM per channel per host.

For Haswell, the mitigations for meltdown/spectre crush I/O performance. It's probably worth finding out whether this penalty is flat or scales with the GPU count.
 

Icecold

Limp Gawd
Joined
Jul 21, 2013
Messages
213
Wow, that's a non-trivial amount of CPU load. You really should test for bottlenecks with like hardware before going any further.
Yeah, that's probably what I need to do. I have a bunch of hardware here already I'm sure at least some of it supports bifurcation, I should just order some of those C Payne adapters and try it out. I'll post back to this thread if I get a functioning setup and how it worked out. It's fairly challenging to test for bottlenecks with BOINC though since there are so many different projects that have different requirements, etc. Some projects barely use any CPU at all to run the GPU, others will use a whole CPU core, some need PCIe lanes others run fine on an x1 riser, etc.
 
Top