Setting up NVLink with 2x 3090 FEs

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
I've been searching for a bit but not having much luck. With the old SLI setup there was a tab in NVIDIA Control Panel where you could configure SLI. But I can't find anything like it in the new version.

The two 3090's are installed. The NVLink bridge is connecting them. Is there any software / driver setup necessary? Or is it just.. On all the time?

I'm just making sure everything is configured right. But I'd also like to do some benchmarks and switch NVLink off with software rather than have to power down and remove the bridge.
 

pendragon1

Fully [H]
Joined
Oct 7, 2000
Messages
27,199
I've been searching for a bit but not having much luck. With the old SLI setup there was a tab in NVIDIA Control Panel where you could configure SLI. But I can't find anything like it in the new version.

The two 3090's are installed. The NVLink bridge is connecting them. Is there any software / driver setup necessary? Or is it just.. On all the time?

I'm just making sure everything is configured right. But I'd also like to do some benchmarks and switch NVLink off with software rather than have to power down and remove the bridge.
theres a post on evga's forums that suggest trying a single monitor in the bottom gpu. could try that. and make sure they are running at the correct pcie speeds.
 

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
theres a post on evga's forums that suggest trying a single monitor in the bottom gpu. could try that. and make sure they are running at the correct pcie speeds.
You could be right about that. I just checked the speeds and they were not matched. So I looked at the manual and I misremembered. SLI is meant to be on PCIe_1 and PCIe_3. I was using PCIe_4.

1619057567985.png


So I moved things around. Set them in PCIe_1 and PCIe_3 as the manual suggests.

Once I booted up the first time, the second 3090 wasn't detected. Device manager reported "Windows has stopped this device because it has reported problems"

So I removed PCIe_1 and booted in only PCIe_3. Same issue. I switched 3090's and used the same PCIe_3 port. Same issue. A 3090 just won't work in PCIe_3 at all.

I moved it to PCIe_4 and everything was OK again.

Any thoughts?

Motherboard is ROG Zenith II Extreme Alpha
 

pendragon1

Fully [H]
Joined
Oct 7, 2000
Messages
27,199
You could be right about that. I just checked the speeds and they were not matched. So I looked at the manual and I misremembered. SLI is meant to be on PCIe_1 and PCIe_3. I was using PCIe_4.

View attachment 349978

So I moved things around. Set them in PCIe_1 and PCIe_3 as the manual suggests.

Once I booted up the first time, the second 3090 wasn't detected. Device manager reported "Windows has stopped this device because it has reported problems"

So I removed PCIe_1 and booted in only PCIe_3. Same issue. I switched 3090's and used the same PCIe_3 port. Same issue. A 3090 just won't work in PCIe_3 at all.

I moved it to PCIe_4 and everything was OK again.

Any thoughts?

Motherboard is ROG Zenith II Extreme Alpha
try it in 2+4, if it works you wont notice a difference. also, bios up to date?
 

Armenius

Fully [H]
Joined
Jan 28, 2014
Messages
25,249
You could be right about that. I just checked the speeds and they were not matched. So I looked at the manual and I misremembered. SLI is meant to be on PCIe_1 and PCIe_3. I was using PCIe_4.

View attachment 349978

So I moved things around. Set them in PCIe_1 and PCIe_3 as the manual suggests.

Once I booted up the first time, the second 3090 wasn't detected. Device manager reported "Windows has stopped this device because it has reported problems"

So I removed PCIe_1 and booted in only PCIe_3. Same issue. I switched 3090's and used the same PCIe_3 port. Same issue. A 3090 just won't work in PCIe_3 at all.

I moved it to PCIe_4 and everything was OK again.

Any thoughts?

Motherboard is ROG Zenith II Extreme Alpha
Is there anything in your configuration that could be disabling PCIe_3 (NVMe drives, etc.)? How many PCI-E lanes in total is your configuration using? Which Threadripper are you using? Can you see the status of connected PCI-E devices in your BIOS? Can you set the bandwidth for the PCIe_3 slot?

It's also possible that slot is simply not working. Do you have another PCI-E AIC you can test in that slot?
 

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
Is there anything in your configuration that could be disabling PCIe_3 (NVMe drives, etc.)? How many PCI-E lanes in total is your configuration using? Which Threadripper are you using? Can you see the status of connected PCI-E devices in your BIOS? Can you set the bandwidth for the PCIe_3 slot?

It's also possible that slot is simply not working. Do you have another PCI-E AIC you can test in that slot?
Using the 3990x threadripper. Latest bios. I'll try a few more things today.

It's starting to look like I've managed to cook a PCIe x16 slot. How do you even do that? With all other PCIe slots empty... It will boot with a 3090 in #3 and I'll get basic video. But device manager always shows a problem with the device. Nvidia Control Panel won't recognize a 3090 installed. Both 3090's fail in that slot but they work everywhere else.

I just ordered a new $800 ROG motherboard. Hopefully the CPU isn't damaged.

My setup:
1600W PSU -> MB + 3 NVMe drives
1600W EVGA PSU -> 1st 3090
860W Corsair PSU -> 2nd 3090

1st and 2nd 3090 bridged with NVLink. The 3090's were on PCIe extension cables.

I'm no longer using extension cables so that won't be a factor.
 
Joined
Jan 16, 2013
Messages
2,229
Using the 3990x threadripper. Latest bios. I'll try a few more things today.

It's starting to look like I've managed to cook a PCIe x16 slot. How do you even do that? With all other PCIe slots empty... It will boot with a 3090 in #3 and I'll get basic video. But device manager always shows a problem with the device. Nvidia Control Panel won't recognize a 3090 installed. Both 3090's fail in that slot but they work everywhere else.

I just ordered a new $800 ROG motherboard. Hopefully the CPU isn't damaged.

My setup:
1600W PSU -> MB + 3 NVMe drives
1600W EVGA PSU -> 1st 3090
860W Corsair PSU -> 2nd 3090

1st and 2nd 3090 bridged with NVLink. The 3090's were on PCIe extension cables.

I'm no longer using extension cables so that won't be a factor.
Why would you not just run everything off of the 1600W PSU? It has more than enough power for 2 3090s...
 

Armenius

Fully [H]
Joined
Jan 28, 2014
Messages
25,249
Using the 3990x threadripper. Latest bios. I'll try a few more things today.

It's starting to look like I've managed to cook a PCIe x16 slot. How do you even do that? With all other PCIe slots empty... It will boot with a 3090 in #3 and I'll get basic video. But device manager always shows a problem with the device. Nvidia Control Panel won't recognize a 3090 installed. Both 3090's fail in that slot but they work everywhere else.

I just ordered a new $800 ROG motherboard. Hopefully the CPU isn't damaged.

My setup:
1600W PSU -> MB + 3 NVMe drives
1600W EVGA PSU -> 1st 3090
860W Corsair PSU -> 2nd 3090

1st and 2nd 3090 bridged with NVLink. The 3090's were on PCIe extension cables.

I'm no longer using extension cables so that won't be a factor.
It seems the M.2 slots will only affect PCIe_4, which is interesting considering you said you were running fine when using PCIe_1 and PCIe_4. The manual says that PCIe_4 will be limited to x4 if M.2_2 (SLOT3) is being used. It could very well be that a pin or pins in the CPU socket could be damaged, a contact on the CPU itself could be damaged, or the slot just didn't work as the motherboard was shipped.
Why would you not just run everything off of the 1600W PSU? It has more than enough power for 2 3090s...
I agree. That is a complicated and unnecessary setup. In that configuration it has to "only" be using 1200W, tops.
 
Joined
Jan 16, 2013
Messages
2,229
It seems the M.2 slots will only affect PCIe_4, which is interesting considering you said you were running fine when using PCIe_1 and PCIe_4. The manual says that PCIe_4 will be limited to x4 if M.2_2 (SLOT3) is being used. It could very well be that a pin or pins in the CPU socket could be damaged, a contact on the CPU itself could be damaged, or the slot just didn't work as the motherboard was shipped.

I agree. That is a complicated and unnecessary setup. In that configuration it has to "only" be using 1200W, tops.
Good info here. OP if you post a pic of the socket pins we can tell you if they're bent or damaged.
 

jmilcher

Supreme [H]ardness
Joined
Feb 3, 2008
Messages
4,974
Do not use two PSU’s. You have more than enough power with 1600w with a quality unit. It may not be the direct issue, but it isn’t helping anything.
 

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
Why would you not just run everything off of the 1600W PSU? It has more than enough power for 2 3090s...
The 3990x is using ~600W in the current config with all 128 threads firing on a render. I'm planning to overclock further in the future. Then 350w per 3090. So I'm already at 1,300 watts before any other HDD or raid are factored in. Even at 1,300w I'm starting to push the margins for continuous usage. And I have extra PSU's so it just made sense.

I couldn't get a good photo of the pins. All the pins look OK but there was some chalky looking bits that looked like oxidization. So with nothing left to lose, I took some very fine sandpaper and gently cleaned it up. They look fine now. But that didn't solve the issue.
 
Joined
Jan 16, 2013
Messages
2,229
The 3990x is using ~600W in the current config with all 128 threads firing on a render. I'm planning to overclock further in the future. Then 350w per 3090. So I'm already at 1,300 watts before any other HDD or raid are factored in. Even at 1,300w I'm starting to push the margins for continuous usage. And I have extra PSU's so it just made sense.

I couldn't get a good photo of the pins. All the pins look OK but there was some chalky looking bits that looked like oxidization. So with nothing left to lose, I took some very fine sandpaper and gently cleaned it up. They look fine now. But that didn't solve the issue.
I would avoid using sandpaper on the pins, you definitely still can tank the entire socket!

Also 1300W is still 100% fine for that 1600W, you should stick to one to avoid any potential issues if possible.
 

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
yup. i use and eraser.
Just to make sure we're all on the same page... I'm talking about the pins inside the slot. In the socket.

Things are going down hill here. Now I can't get either of the 2 x16 slots to work. #2 and #4 are OK. #1 and #3 will boot and show video but the driver won't load. Tried both 3090s. Same.

#2 and #4 work fine but they're x8.

I've never seen this before. I wonder if the weight of the 3090 FE is a factor. There's no visible damage on the motherboard but maybe some pins inside the socket collapsed and not making full contact. Kind of stuck until the replacement mb arrives Saturday.
 

pendragon1

Fully [H]
Joined
Oct 7, 2000
Messages
27,199
Just to make sure we're all on the same page... I'm talking about the pins inside the slot. In the socket.

Things are going down hill here. Now I can't get either of the 2 x16 slots to work. #2 and #4 are OK. #1 and #3 will boot and show video but the driver won't load. Tried both 3090s. Same.

#2 and #4 work fine but they're x8.

I've never seen this before. I wonder if the weight of the 3090 FE is a factor. There's no visible damage on the motherboard but maybe some pins inside the socket collapsed and not making full contact. Kind of stuck until the replacement mb arrives Saturday.
ah, i think i was misunderstanding and thinking of the fingers on the card, whoops.
that is odd but the 4.0 x8 should be fine.
try laying the system on its side maybe?
 
Joined
Jan 16, 2013
Messages
2,229
Just to make sure we're all on the same page... I'm talking about the pins inside the slot. In the socket.

Things are going down hill here. Now I can't get either of the 2 x16 slots to work. #2 and #4 are OK. #1 and #3 will boot and show video but the driver won't load. Tried both 3090s. Same.

#2 and #4 work fine but they're x8.

I've never seen this before. I wonder if the weight of the 3090 FE is a factor. There's no visible damage on the motherboard but maybe some pins inside the socket collapsed and not making full contact. Kind of stuck until the replacement mb arrives Saturday.
Doubt the weight is a factor. They usually have reinforced slots on the $500+ boards. That 800 one you ordered sure as shit better.
 

Armenius

Fully [H]
Joined
Jan 28, 2014
Messages
25,249
The 3990x is using ~600W in the current config with all 128 threads firing on a render. I'm planning to overclock further in the future. Then 350w per 3090. So I'm already at 1,300 watts before any other HDD or raid are factored in. Even at 1,300w I'm starting to push the margins for continuous usage. And I have extra PSU's so it just made sense.

I couldn't get a good photo of the pins. All the pins look OK but there was some chalky looking bits that looked like oxidization. So with nothing left to lose, I took some very fine sandpaper and gently cleaned it up. They look fine now. But that didn't solve the issue.
Can't think of anything that would cause oxidization on the pins in the motherboard except maybe a voltage issue. Makes me think that there wasn't proper contact between the pins and the CPU to begin with, assuming the process for installing the CPU was followed since it's unique for how large this processor is.
Just to make sure we're all on the same page... I'm talking about the pins inside the slot. In the socket.

Things are going down hill here. Now I can't get either of the 2 x16 slots to work. #2 and #4 are OK. #1 and #3 will boot and show video but the driver won't load. Tried both 3090s. Same.

#2 and #4 work fine but they're x8.

I've never seen this before. I wonder if the weight of the 3090 FE is a factor. There's no visible damage on the motherboard but maybe some pins inside the socket collapsed and not making full contact. Kind of stuck until the replacement mb arrives Saturday.
I have had weight be an issue before, but the slot continued to work fine for the life of the motherboard despite some sagging. That said, the 3090 is a much beefier card. I would think about getting some support brackets just in case if you get a new motherboard.

How do the fingers on the video cards look?
 

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
Well honestly I'm a bit pissed. This was all a software issue. This entire time. There's nothing wrong with the sockets, cards, or cpu. I didn't need to order a $800 mb. Just a colossal waste of time.

First, I have to say that I've already tried all of these steps before. But something about this time was different. I don't know how.

1. Installed both cards in #1 and #3. Of course, it boots up with drivers disabled and error 43. Just like the other 100 times.
2. Open device manager
3. Right click uninstall #1 3090
4. Right click uninstall #3 3090
5. Run NVIDIA driver installation
Installer stalls at installing driver. Ok. Reboot..
6. Run NVIDIA driver installation
7. Run NVIDIA control panel. Finally - everything looks ok. Reboot.. still ok. And that's it.

My theory.. Windows or Nvidia has a counter on each PCIe port. After some # of errors, windows will disable that port. So even if you boot and everything is OK, the driver will just assume there's something wrong with it and not even try to startup.

I don't know why everything just worked this time. I've done the same steps in the past. Nothing else changed.

So what caused the errors in the first place that disabled the drivers?? Probably the PCIe extension cables and PCIe Gen 4. None of the cables truly seem to support Gen 4.

Before reinstalling the 3090's back on the cables, I forced Gen 3 on both #1 and #3 instead of AUTO. Everything has worked since then. I may try to get Gen 4 to work again later...

NVIDIA control panel reports both cards as x16 gen 3. Great.

OK - so now that SLI is enabled I open PassMark and run 3D Graphics Mark mostly just as a stress-test to confirm SLI is up and stable.

My score with 1x 3090 in PCIe #1 directly on the socket was 25,632. Stock. No new thermal paste, tuning, or oc.
My score with 2x 3090's in PCIe #1 and #3... 23,197

Edit..
It was 23,197. I started to tune the cpu again. I'm at 27,709 on the 3D Mark now. So overall improvement. Rendering, encoding, and ray tracing is where I'll get the ROI.
 
Last edited:

Armenius

Fully [H]
Joined
Jan 28, 2014
Messages
25,249
Well honestly I'm a bit pissed. This was all a software issue. This entire time. There's nothing wrong with the sockets, cards, or cpu. I didn't need to order a $800 mb. Just a colossal waste of time.

First, I have to say that I've already tried all of these steps before. But something about this time was different. I don't know how.

1. Installed both cards in #1 and #3. Of course, it boots up with drivers disabled and error 43. Just like the other 100 times.
2. Open device manager
3. Right click uninstall #1 3090
4. Right click uninstall #3 3090
5. Run NVIDIA driver installation
Installer stalls at installing driver. Ok. Reboot..
6. Run NVIDIA driver installation
7. Run NVIDIA control panel. Finally - everything looks ok. Reboot.. still ok. And that's it.

My theory.. Windows or Nvidia has a counter on each PCIe port. After some # of errors, windows will disable that port. So even if you boot and everything is OK, the driver will just assume there's something wrong with it and not even try to startup.

I don't know why everything just worked this time. I've done the same steps in the past. Nothing else changed.

So what caused the errors in the first place that disabled the drivers?? Probably the PCIe extension cables and PCIe Gen 4. None of the cables truly seem to support Gen 4.

Before reinstalling the 3090's back on the cables, I forced Gen 3 on both #1 and #3 instead of AUTO. Everything has worked since then. I may try to get Gen 4 to work again later...

NVIDIA control panel reports both cards as x16 gen 3. Great.

OK - so now that SLI is enabled I open PassMark and run 3D Graphics Mark mostly just as a stress-test to confirm SLI is up and stable.

My score with 1x 3090 in PCIe #1 directly on the socket was 25,632. Stock. No new thermal paste, tuning, or oc.
My score with 2x 3090's in PCIe #1 and #3... 23,197

Edit..
It was 23,197. I started to tune the cpu again. I'm at 27,709 on the 3D Mark now. So overall improvement. Rendering, encoding, and ray tracing is where I'll get the ROI.
The cables are PCI-E version agnostic. I would migrate everything to a single power supply to eliminate that as a potential issue.
 

newbie3

n00b
Joined
Mar 25, 2021
Messages
10
The cables are PCI-E version agnostic. I would migrate everything to a single power supply to eliminate that as a potential issue.
Unfortunately, that's not the case. Gen 4 will not work if the cards are on the cables. But both 3090's work perfectly in Gen 4 when directly on the motherboard socket:

1619232576660.png


I just tried again a moment ago and confirmed it. The drivers only load when forced to Gen 3. Gen 4 with cables immediately causes problems.

This is a common experience with many others on other threads and forums. There are some who claim they got X card on Gen 4 to work with extension cables but it's hit and miss at best. I've tried 3 cable brands with 0 luck.

I haven't used a secondary PSU in the last 2 days so that hasn't been a factor.
 

Armenius

Fully [H]
Joined
Jan 28, 2014
Messages
25,249
Unfortunately, that's not the case. Gen 4 will not work if the cards are on the cables. But both 3090's work perfectly in Gen 4 when directly on the motherboard socket:

View attachment 350439

I just tried again a moment ago and confirmed it. The drivers only load when forced to Gen 3. Gen 4 with cables immediately causes problems.

This is a common experience with many others on other threads and forums. There are some who claim they got X card on Gen 4 to work with extension cables but it's hit and miss at best. I've tried 3 cable brands with 0 luck.

I haven't used a secondary PSU in the last 2 days so that hasn't been a factor.
Do you mean risers? I think that might be the piece of the puzzle that was missing throughout all of this. Using PCI-E risers definitely complicates things.
 
Top