starting to play with 10gb - Mellanox and need help

TeleFragger

[H]ard|Gawd
Joined
Nov 10, 2005
Messages
1,119
Ok so I'm new to this and I have a test lab at work.

I have 4 esxi boxes - will get to them later
a windows 10 pc (dell precision 5810)

Our guys were getting rid of a mess of hardware so I grabbed it and figured time to play around.

I stuck a mellanox card in the win10 box and drivers installed, albeit, MS drivers. but it seems to be good?

I also connected a cable to the switch and the light on the switch lit up but in win10 the NIC says cable unattached.

switch - Flextronics F-x430066 8 Port 4x SDR Infiniband
cable - Mellanox Mcc4l30-300 Microgigacn Latch 0.3 M Infiniband Cable
cards - HPE InfiniBand 4X DDR Conn-X PCI-E G2 Dual Port HCA (483514-B21)

so few questions...
  1. what is needed on win10 side other than what I have done?
  2. if I put another card in a secondary pc, and just attach both to the switch, just give static IP's and that is it?

ill just start with them...thx


mellanox.png
 
now that I'm reviewing what I posted....

can you use an SDR switch with DDR cards?

if not... can you just take a cable from one machine to another? ive seen that in labs by googling. ill have to check that out again....
 
good question... it seems smarter than I...


manual - https://www.bwi.com/document/24245

overview says...
This User’s Manual provides an overview of the Eight 4X InfiniBand Port Switch System based on Mellanox Technologies’ MT43132 InfiniScale switch device.
The switch platform comes pre-installed with all necessary firmware and configuration for standard operation in an InfiniBand fabric running an InfiniBand compliant Subnet Management software in the subnet. All that is required for normal operation is to follow the usual precautions for installation and connection to the fabric. Once connected, the Subnet Management software automatically configures and begins utilizing the switch.


so out of the manual... I get the green light with just the cable from pc to switch and nothing else to switch...
The GREEN LED indicator to the left of each port will light when the physical connection is established (that is, when the unit is powered on and a cable is plugged into the port with a functioning port plugged into the other end of the connector)
 
so when I run through the steps to check firmware version.. it recalls back that it is a mt26418 and when I google that.. that is not the card I have...

firmware.png
 
I'm a super n00b on this so not sure. all I know is it came with the mst and flxfwmanager commands...
 
Infiniband is not really designed as a general networking sort of tech, but rather an extra low latency, extra low overhead direct serial connection meant for passing application data or storage data between nodes. IPoIB allows Infiniband data to be encapusalted into an IP packet for a more general networking type, but you are still running Infiniband mode and need both an Infiniband switch and a subnet manager running. The subnet manager may run on the switch, it may not. depends on your switch.

Your switch is an infiniband switch, and the cards are infiniband. You *may* be able to run them in Ethernet mode and the switch pass Ethernet traffic, but I dont know for sure if the switch has that capability or not. The NICs themselves do I know that for certain. Either way, the whole setup was designed for Infiniband use and not general 10gb Ethernet like you are used to. It will be VERY complicated for you to set this up and honestly not worth your time or effort. What you can use this for is running 10GbE between a couple of your computers in Ethernet mode. Say if you want a direct connection between a server and your desktop PC for instance to have higher bandwidth for large file transfers. You can run the cable between two NICs in Ethernet mode and use it this way very easily. Just go into the driver properties and switch them from Infiniband to Ethernet, plug in your cable between the computer and your off and running.



EDIT: also if you want a section of your network running infiniband traffic you will need a gateway that can convert Infiniband packets to Ethernet packets and vise versa to connect your infiniband network to the outside world. You could however have all the computers with Infiniband also have Ethernet, and use IB for your SAN storage and Ethernet for general internet and networking traffic.
 
Last edited:
in my infiniband network i use the oFed drivers.

one computer that's always on runs the subnet manager.

each nic is plugged in to a switch like yours but mine is 24 port.

each nic is assigned an address that is different than the main network.

main being 192.168.0.x infiniband network is 10.0.0.x

each infiniband nic is also given a different subnet than my main network.

main being 255.255.255.0 inf network 255.0.0.0

that's how i do it.
 
in my infiniband network i use the oFed drivers.

I see OFED for Linux, FreeBSD, GPUDirect and VMWare but only WinOF or WinOF-2 for Windows... what is the file name?

I installed MLNX_VPI_WinOF-5_50_50000_All_Win2016_x64
ill have to look for

one computer that's always on runs the subnet manager.

subnet manager.. got any info on that.. guessing you need to install it so ill search and see what I can come up with...


each nic is plugged in to a switch like yours but mine is 24 port.
each nic is assigned an address that is different than the main network.
main being 192.168.0.x infiniband network is 10.0.0.x
each infiniband nic is also given a different subnet than my main network.
main being 255.255.255.0 inf network 255.0.0.0
that's how i do it.

yeah that makes sense.. but if your setting the IP on the NIC on each side, whats the point of the Subnet manager?

I have done multi networks like your saying and yes do entirely different ranges to keep them separate and your head clear..

p.s. thanks for all your help. its appreciated..
 
I see OFED for Linux, FreeBSD, GPUDirect and VMWare but only WinOF or WinOF-2 for Windows... what is the file name?

I installed MLNX_VPI_WinOF-5_50_50000_All_Win2016_x64
ill have to look for





subnet manager.. got any info on that.. guessing you need to install it so ill search and see what I can come up with...




yeah that makes sense.. but if your setting the IP on the NIC on each side, whats the point of the Subnet manager?

I have done multi networks like your saying and yes do entirely different ranges to keep them separate and your head clear..

p.s. thanks for all your help. its appreciated..

in the installation package it will have subnet managaer and after installation it will ask if you want to start it.

just say yes but only on one computer.

i think i am using the 3.1 drivers

3.2 did't work at all.
 
in the installation package it will have subnet managaer and after installation it will ask if you want to start it.

just say yes but only on one computer.

i think i am using the 3.1 drivers

3.2 did't work at all.

so I tried winof-2 drivers and it didn't go...

for WinOF 5.50 - that installed.... I went to archived and only goes back to 4.6 so not even close to 3.x
 
mine were connect x-2

where I'm really confused is .. the p/n googled says..
HPE InfiniBand 4X DDR Conn-X PCI-E G2 Dual Port HCA (483514-B21)

but the mst command states mt26418_pci_cr0 and my windows 10 box picture says Mellanox ConnectX IPoIB Adapter

I have no clue if this is a 2 or 3... or is it possible to be a 1? so the 2/3 use the same driver pack.. 3 and up use the -2... so ive read..
 
yeah that makes sense.. but if your setting the IP on the NIC on each side, whats the point of the Subnet manager?

Think of the subnet manager kind of like your network router, it enables traffic between the points of the infiniband network by managing how hops on the network are handled and manages failover and priorities. A subnet manager must be running at all times, so it is best run on a server or inside the switch itself. You can have more than 1 subnet manager running at the same time as long as you set them up with proper priorities so that 1 is the master subnet manager and the other only takes over if the master goes down. It really has nothing at all to do with IPs, and normally with Infiniband you are dealing with GUIDs and LIDs and not IPs





All devices in a subnet have a Local Identifier (LID), a 16-bit address assigned by the Subnet Manager. All packets sent within a subnet use the LID as the destination address for forwarding and switching packets at the Link Level. The LIDs allow for up to 48,000 end nodes within a single subnet. When a subnet is reconfigured, new LIDs are assigned to the various endpoints within the subnet. Routing between different subnets is done on the basis of a Global Identifier (GID), a 128-bit address modeled after IPv6 addresses, which allows for InfiniBand’s essentially unlimited scalability. GIDs identify an end node, port, switch, or multicast group. Global Unique Identifiers (GUID) are 64-bit definitions for all the elements within a subnet, including chassis, HCAs, switches, routers, and ports. The GUID never changes, and is used as part of the address for creating a GID. GIDs and GUIDs are independent of LIDs and are therefore immune to subnet reconfiguration.


Does InfiniBand Support IP Traffic? What Is IPoIB?
Internet Protocol (IP) packets can be sent via an InfiniBand interface by encapsulating the IP packets in an InfiniBand packet via a network interface. This is known as IP over IB (IPoIB). As long as the InfiniBand network has the necessary driver installed, it creates an interface for each port using partition keys (PKEYs) and can then transport IP packets across the InfiniBand network seamlessly.
 
Last edited:
3.0 didn't work. says requires win7 or server 2008.. wont install on win10 1809... gotta reboot as im doin all kinds of upgrades to the machine im on .. hah.. which is also the one with the card
 
I have ConnectX-2's.
I use windows own drivers (Server 2016). Every now and then when I restart one machine, the other would stay in "cabble unplugged" state. I guess this is a firmware issue because other network cards do not exhibit any such behavior.
After a simple disable/enable of the network connection, it comes to life. I made scripts for this to automate things and put it in task scheduler, because I've no time for further investigations.
 
I have ConnectX-2's.
I use windows own drivers (Server 2016). Every now and then when I restart one machine, the other would stay in "cabble unplugged" state. I guess this is a firmware issue because other network cards do not exhibit any such behavior.
After a simple disable/enable of the network connection, it comes to life. I made scripts for this to automate things and put it in task scheduler, because I've no time for further investigations.

im thinking mine are connectx-1 cards.. do they exist??? hah.. I don't know.. like shown above.. the mlxfwmanager fails so I think im not far enough along. didn't get to poke around more on it but hopefully next week.
 
so I just plugged a cable into my card and into the switch and I get the following... am I good? just give ip addresses to the card?

[Dec-08-2018 18:36:49:905][4CA8] 0x03 -> OpenSM 3.3.11 UMAD
[Dec-08-2018 18:36:49:905][4CA8] 0x80 -> OpenSM 3.3.11 UMAD
[Dec-08-2018 18:36:49:908][4CA8] 0x02 -> osm_vendor_init: 1000 pending umads specified
[Dec-08-2018 18:36:49:909][4CA8] 0x80 -> Entering DISCOVERING state
[Dec-08-2018 18:36:49:914][4CA8] 0x02 -> osm_vendor_bind: Binding to port 0x1635ffffbf0bb5
[Dec-08-2018 18:36:49:937][4CA8] 0x02 -> osm_vendor_bind: Binding to port 0x1635ffffbf0bb5
[Dec-08-2018 18:36:49:937][4CA8] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x001635ffffbf0bb5
[Dec-08-2018 18:36:49:938][5144] 0x01 -> osm_si_rcv_process: ERR 3610:
Bad LinearFDBTop value = 0xC000 on switch 0xb8cffff00472b
Forcing internal correction to 0x0
[Dec-08-2018 18:36:53:385][2AA4] 0x80 -> Entering MASTER state
[Dec-08-2018 18:36:53:387][2AA4] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches
[Dec-08-2018 18:36:53:388][2AA4] 0x80 -> SUBNET UP
[Dec-08-2018 18:36:53:393][3C54] 0x01 -> log_trap_info: Received Generic Notice type:4 num:144 (CapabilityMask, NodeDescription, Link [Width|Speed] Enabled, SM priority changed) Producer:1 (Channel Adapter) from LID:1 TID:0x0000000000000002
[Dec-08-2018 18:36:53:393][3C54] 0x01 -> osm_get_port_by_mad_addr: ERR 7504: Lid is out of range: 0
[Dec-08-2018 18:36:53:393][3C54] 0x01 -> trap_rcv_process_request: ERR 3809: Failed to find source physical port for trap
[Dec-08-2018 18:36:53:393][3C54] 0x02 -> log_notice: Reporting Generic Notice type:4 num:144 (CapabilityMask, NodeDescription, Link [Width|Speed] Enabled, SM priority changed) from LID:1 GID:fe80::16:35ff:ffbf:bb5
[Dec-08-2018 18:36:53:399][2AA4] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches
[Dec-08-2018 18:36:53:399][2AA4] 0x02 -> SUBNET UP
[Dec-08-2018 18:36:53:517][3C54] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:401b:ffff::ffff:ffff
[Dec-08-2018 18:36:53:517][4574] 0x02 -> log_notice: Reporting Generic Notice type:3 num:67 (Mcast group deleted) from LID:1 GID:ff12:401b:ffff::ffff:ffff




so i have to get a second computer with the card to try...

 
Using them with a hypervisor? I'm surprised you didn't opt for an Intel X520-DA2 and just be done with it. I mean, sure Mellanox cards can work very well, but not without a lot of work and not right out of the box.
 
Using them with a hypervisor? I'm surprised you didn't opt for an Intel X520-DA2 and just be done with it. I mean, sure Mellanox cards can work very well, but not without a lot of work and not right out of the box.


As said.. all parts given . Ive got like 10 cards and 10 cables and 4 switches. B4 xmas and buy stuff?????
 
Using them with a hypervisor? I'm surprised you didn't opt for an Intel X520-DA2 and just be done with it. I mean, sure Mellanox cards can work very well, but not without a lot of work and not right out of the box.


Mellanox cards themselves are plug and play for what they are. Windows even comes with drivers because these cards are so prevalent across industries. The issue is not with the brand, but the type of card. Mellanox Ethernet cards are plug them in and they function great with no special config, Mellanox Infiniband cards plug in and work fine on the NIC end, but Infiniband network in general requires the additional setup simply due to the nature of what Infiniband is like, it is not Ethernet and it is not designed for general "everything runs normal across this network" type of traffic.
 
yeah playing around is getting old.. I have the one in my win10 box and it gets a link light on the switch when cable is in and when opensm is launched, both lights light up...
dropped one of these into a Lenovo P710 with esxi 6.7 and it sees it but I cant get a light when cable is plugged into the card... ive got too much work to do, to play around right now..

but is there a secret on the esxi side that I'm missing?

10g1.png



10g2.png
 
mlx4_en is an Ethernet driver, when you want to be using an Infiniband driver (mlx4_ib). Your switch probably cannot bring a link up when it is trying to negotiate Ethernet
 
mlx4_en is an Ethernet driver, when you want to be using an Infiniband driver (mlx4_ib). Your switch probably cannot bring a link up when it is trying to negotiate Ethernet

well tlhen with that.. since there is Ethernet driver, can I connect this cable from nic to nic on win10? what would be needed to do that if so? just connect cable and ... .?tried it and didn't get link up
 
well tlhen with that.. since there is Ethernet driver, can I connect this cable from nic to nic on win10? what would be needed to do that if so? just connect cable and ... .?tried it and didn't get link up


In Windows I believe the driver is an all-in-one type and includes both EN and IB. What you need to do is open device manager, go to hardware properties of the NIC, and select Ethernet. Then it will enable nic to nic communication over ethernet. You do not need a subnet manager for doing this, the subnet manager is just for infiniband traffic protocol. Set the IP of both nics in Windows to a different subnet as the main one you use to connect to the router. For example if your basic network is 192.168.1.x then just use 192.168.2.x for both nics. Set a folder to be shared, map it on the other Windows pc and your set.
 
In Windows I believe the driver is an all-in-one type and includes both EN and IB. What you need to do is open device manager, go to hardware properties of the NIC, and select Ethernet. Then it will enable nic to nic communication over ethernet. You do not need a subnet manager for doing this, the subnet manager is just for infiniband traffic protocol. Set the IP of both nics in Windows to a different subnet as the main one you use to connect to the router. For example if your basic network is 192.168.1.x then just use 192.168.2.x for both nics. Set a folder to be shared, map it on the other Windows pc and your set.

ive been going crazy on that too... in windows 10 I have the driver tab as you see way up in post 1.. so I don't have the pic I see others posting where you can select Ethernet or ipob.... trying to figure that out as well...
running 5.50 windows driver...
 
I forgot to go get a screenshot of what the driver page should look like yesterday. Ill try to remember to do it when I get home today. I think you might be looking in the wrong driver place, that might be the issue.
 
ok I'm gonna ditch all of this. cant figure out how to get it to work..
I even put 2 cards in 1 machine and all said was well.. assigned 10.10.1.1 through 10.10.1.4 to all the nic cables... attached 4 cables to the switch just to not miss anything.... ran opensm and gave it time. I couldn't ping any of the ports... none... all cables still said unplugged, etc..

its before xmas so I'm gonna have to wait but I'm still going to start looking for hardware. I see people doing point to point.. get 3 dual nic cards and go that route which should work for me..

so I'm looking at connectx-2 and connectx-3 cards but I see EN and VPI ... since I am going to be using 2 in windows boxes and 1 in esxi 6.7 (to which what I saw mellanox said connectx-4 or 5 only for 6.7???)

ive got $80 on amazon gift cards and is see a few cards I could get then ebay for a cable or 2... and try and start small nodes first...

thoughts on what I should get to play around?


my end goal is simple...
server 2016 - houses family file shares of pics, movies, etc and I also do video/photo edits
gamer - use to play games and edit videos, photos and transfer all to server 2016
esxi 6.7 - used as a virtual lab but would love to do freenas and have vms stored there - more for a to do it type thing

plex server / OTA dvr - files really kept on server 2016 so not sure I need 10g here
 
I forgot to go get a screenshot of what the driver page should look like yesterday. Ill try to remember to do it when I get home today. I think you might be looking in the wrong driver place, that might be the issue.

well ill hold off from ditching my idea to see what you show...
 
so I'm looking at connectx-2 and connectx-3 cards but I see EN and VPI

ConnectX- # -EN = Ethernet cards
VPI = Infiniband cards (capable of running Infiniband or Ethernet depending on mode they are in)



I am wondering since you have ConnectX-1 cards, maybe they are not VPI models and do not support ethernet?
 
Last edited:
ConnectX-IB are only infiniband cards, they cant work with Ethernet. However those are not the cards you have. I went and looked back at the first post and you have ConnectX-2 cards branded for HP.
http://www.mellanox.com/pdf/products/oem/HP_Reference_Guide.pdf

I appreciate all your help going back and forth so please don't think I'm complaining..hah..

I looked at the back of a card and it shows the 3rd one under HCA products
HP IB 4X DDR Conn-X PCI-e G2 Dual Port HCA 483514-B21 487505-001 MHGH29-XTC

so after more googling I found out that while under network adapters it shows Mellanox ConnectX IPIB adapter 1 and 2... I see farther down..

under system devices it says
Mellanox ConnectX VPI (MT26418) PCIe 2.0 5GT/s, IB DDR /10GigE Network Adapter...

and all 3 are showing drivers installed.. and 5.50..
 
so check this out.. playing around more.. I got farther...

so I have HP cards and the fw is 2.8 ...


and it is in IB mode...

flint.PNG



hca.PNG
 

That is telling you that you downloaded mellanox technologies branded firmware, and it has HP branded firmware installed. Im not sure if it is safe to force a flash of the MT firmware or not, perhaps talk to either Mellanox or HP and see if you can get the right PSID branded firmware for your card.



As for it being in Infiniband mode, that should read out differently once we figure out the Windows driver thing and get it switched there into Ethernet.
However, I also know it is possible to switch modes within that command line tool you are using, I have done it before on ConnectX-3 cards.
 
Back
Top