starting to play with 10gb - Mellanox and need help

Discussion in 'Networking & Security' started by TeleFragger, Dec 6, 2018.

  1. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    Ok so I'm new to this and I have a test lab at work.

    I have 4 esxi boxes - will get to them later
    a windows 10 pc (dell precision 5810)

    Our guys were getting rid of a mess of hardware so I grabbed it and figured time to play around.

    I stuck a mellanox card in the win10 box and drivers installed, albeit, MS drivers. but it seems to be good?

    I also connected a cable to the switch and the light on the switch lit up but in win10 the NIC says cable unattached.

    switch - Flextronics F-x430066 8 Port 4x SDR Infiniband
    cable - Mellanox Mcc4l30-300 Microgigacn Latch 0.3 M Infiniband Cable
    cards - HPE InfiniBand 4X DDR Conn-X PCI-E G2 Dual Port HCA (483514-B21)

    so few questions...
    1. what is needed on win10 side other than what I have done?
    2. if I put another card in a secondary pc, and just attach both to the switch, just give static IP's and that is it?

    ill just start with them...thx


    mellanox.png
     
  2. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    now that I'm reviewing what I posted....

    can you use an SDR switch with DDR cards?

    if not... can you just take a cable from one machine to another? ive seen that in labs by googling. ill have to check that out again....
     
  3. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
    is your switch managed or dumb?
     
  4. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    good question... it seems smarter than I...


    manual - https://www.bwi.com/document/24245

    overview says...
    This User’s Manual provides an overview of the Eight 4X InfiniBand Port Switch System based on Mellanox Technologies’ MT43132 InfiniScale switch device.
    The switch platform comes pre-installed with all necessary firmware and configuration for standard operation in an InfiniBand fabric running an InfiniBand compliant Subnet Management software in the subnet. All that is required for normal operation is to follow the usual precautions for installation and connection to the fabric. Once connected, the Subnet Management software automatically configures and begins utilizing the switch.


    so out of the manual... I get the green light with just the cable from pc to switch and nothing else to switch...
    The GREEN LED indicator to the left of each port will light when the physical connection is established (that is, when the unit is powered on and a cable is plugged into the port with a functioning port plugged into the other end of the connector)
     
  5. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    so when I run through the steps to check firmware version.. it recalls back that it is a mt26418 and when I google that.. that is not the card I have...

    firmware.png
     
  6. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
    do the mellanox drivers come with a subnet manager?
     
  7. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    I'm a super n00b on this so not sure. all I know is it came with the mst and flxfwmanager commands...
     
  8. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
    Infiniband is not really designed as a general networking sort of tech, but rather an extra low latency, extra low overhead direct serial connection meant for passing application data or storage data between nodes. IPoIB allows Infiniband data to be encapusalted into an IP packet for a more general networking type, but you are still running Infiniband mode and need both an Infiniband switch and a subnet manager running. The subnet manager may run on the switch, it may not. depends on your switch.

    Your switch is an infiniband switch, and the cards are infiniband. You *may* be able to run them in Ethernet mode and the switch pass Ethernet traffic, but I dont know for sure if the switch has that capability or not. The NICs themselves do I know that for certain. Either way, the whole setup was designed for Infiniband use and not general 10gb Ethernet like you are used to. It will be VERY complicated for you to set this up and honestly not worth your time or effort. What you can use this for is running 10GbE between a couple of your computers in Ethernet mode. Say if you want a direct connection between a server and your desktop PC for instance to have higher bandwidth for large file transfers. You can run the cable between two NICs in Ethernet mode and use it this way very easily. Just go into the driver properties and switch them from Infiniband to Ethernet, plug in your cable between the computer and your off and running.



    EDIT: also if you want a section of your network running infiniband traffic you will need a gateway that can convert Infiniband packets to Ethernet packets and vise versa to connect your infiniband network to the outside world. You could however have all the computers with Infiniband also have Ethernet, and use IB for your SAN storage and Ethernet for general internet and networking traffic.
     
    Last edited: Dec 6, 2018
    MrGuvernment and MixManSC like this.
  9. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
    in my infiniband network i use the oFed drivers.

    one computer that's always on runs the subnet manager.

    each nic is plugged in to a switch like yours but mine is 24 port.

    each nic is assigned an address that is different than the main network.

    main being 192.168.0.x infiniband network is 10.0.0.x

    each infiniband nic is also given a different subnet than my main network.

    main being 255.255.255.0 inf network 255.0.0.0

    that's how i do it.
     
    TeleFragger likes this.
  10. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    I see OFED for Linux, FreeBSD, GPUDirect and VMWare but only WinOF or WinOF-2 for Windows... what is the file name?

    I installed MLNX_VPI_WinOF-5_50_50000_All_Win2016_x64
    ill have to look for

    subnet manager.. got any info on that.. guessing you need to install it so ill search and see what I can come up with...


    yeah that makes sense.. but if your setting the IP on the NIC on each side, whats the point of the Subnet manager?

    I have done multi networks like your saying and yes do entirely different ranges to keep them separate and your head clear..

    p.s. thanks for all your help. its appreciated..
     
  11. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
    in the installation package it will have subnet managaer and after installation it will ask if you want to start it.

    just say yes but only on one computer.

    i think i am using the 3.1 drivers

    3.2 did't work at all.
     
    TeleFragger likes this.
  12. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    so I tried winof-2 drivers and it didn't go...

    for WinOF 5.50 - that installed.... I went to archived and only goes back to 4.6 so not even close to 3.x
     
  13. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
    i have the file at home i'll have it up around 3pm EST
     
    TeleFragger likes this.
  14. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    thanks... also.. is this a connectx-2 or 3 card?
     
  15. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
  16. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    where I'm really confused is .. the p/n googled says..
    HPE InfiniBand 4X DDR Conn-X PCI-E G2 Dual Port HCA (483514-B21)

    but the mst command states mt26418_pci_cr0 and my windows 10 box picture says Mellanox ConnectX IPoIB Adapter

    I have no clue if this is a 2 or 3... or is it possible to be a 1? so the 2/3 use the same driver pack.. 3 and up use the -2... so ive read..
     
  17. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
    Think of the subnet manager kind of like your network router, it enables traffic between the points of the infiniband network by managing how hops on the network are handled and manages failover and priorities. A subnet manager must be running at all times, so it is best run on a server or inside the switch itself. You can have more than 1 subnet manager running at the same time as long as you set them up with proper priorities so that 1 is the master subnet manager and the other only takes over if the master goes down. It really has nothing at all to do with IPs, and normally with Infiniband you are dealing with GUIDs and LIDs and not IPs





     
    Last edited: Dec 6, 2018
    TeleFragger likes this.
  18. Master_shake_

    Master_shake_ [H]ardness Supreme

    Messages:
    8,036
    Joined:
    Apr 9, 2012
    Last edited: Dec 6, 2018
  19. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    3.0 didn't work. says requires win7 or server 2008.. wont install on win10 1809... gotta reboot as im doin all kinds of upgrades to the machine im on .. hah.. which is also the one with the card
     
  20. tedych

    tedych Limp Gawd

    Messages:
    332
    Joined:
    Jan 18, 2013
    I have ConnectX-2's.
    I use windows own drivers (Server 2016). Every now and then when I restart one machine, the other would stay in "cabble unplugged" state. I guess this is a firmware issue because other network cards do not exhibit any such behavior.
    After a simple disable/enable of the network connection, it comes to life. I made scripts for this to automate things and put it in task scheduler, because I've no time for further investigations.
     
    TeleFragger likes this.
  21. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    im thinking mine are connectx-1 cards.. do they exist??? hah.. I don't know.. like shown above.. the mlxfwmanager fails so I think im not far enough along. didn't get to poke around more on it but hopefully next week.
     
  22. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    so I just plugged a cable into my card and into the switch and I get the following... am I good? just give ip addresses to the card?

    [Dec-08-2018 18:36:49:905][4CA8] 0x03 -> OpenSM 3.3.11 UMAD
    [Dec-08-2018 18:36:49:905][4CA8] 0x80 -> OpenSM 3.3.11 UMAD
    [Dec-08-2018 18:36:49:908][4CA8] 0x02 -> osm_vendor_init: 1000 pending umads specified
    [Dec-08-2018 18:36:49:909][4CA8] 0x80 -> Entering DISCOVERING state
    [Dec-08-2018 18:36:49:914][4CA8] 0x02 -> osm_vendor_bind: Binding to port 0x1635ffffbf0bb5
    [Dec-08-2018 18:36:49:937][4CA8] 0x02 -> osm_vendor_bind: Binding to port 0x1635ffffbf0bb5
    [Dec-08-2018 18:36:49:937][4CA8] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x001635ffffbf0bb5
    [Dec-08-2018 18:36:49:938][5144] 0x01 -> osm_si_rcv_process: ERR 3610:
    Bad LinearFDBTop value = 0xC000 on switch 0xb8cffff00472b
    Forcing internal correction to 0x0
    [Dec-08-2018 18:36:53:385][2AA4] 0x80 -> Entering MASTER state
    [Dec-08-2018 18:36:53:387][2AA4] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches
    [Dec-08-2018 18:36:53:388][2AA4] 0x80 -> SUBNET UP
    [Dec-08-2018 18:36:53:393][3C54] 0x01 -> log_trap_info: Received Generic Notice type:4 num:144 (CapabilityMask, NodeDescription, Link [Width|Speed] Enabled, SM priority changed) Producer:1 (Channel Adapter) from LID:1 TID:0x0000000000000002
    [Dec-08-2018 18:36:53:393][3C54] 0x01 -> osm_get_port_by_mad_addr: ERR 7504: Lid is out of range: 0
    [Dec-08-2018 18:36:53:393][3C54] 0x01 -> trap_rcv_process_request: ERR 3809: Failed to find source physical port for trap
    [Dec-08-2018 18:36:53:393][3C54] 0x02 -> log_notice: Reporting Generic Notice type:4 num:144 (CapabilityMask, NodeDescription, Link [Width|Speed] Enabled, SM priority changed) from LID:1 GID:fe80::16:35ff:ffbf:bb5
    [Dec-08-2018 18:36:53:399][2AA4] 0x02 -> osm_ucast_mgr_process: minhop tables configured on all switches
    [Dec-08-2018 18:36:53:399][2AA4] 0x02 -> SUBNET UP
    [Dec-08-2018 18:36:53:517][3C54] 0x02 -> log_notice: Reporting Generic Notice type:3 num:66 (New mcast group created) from LID:1 GID:ff12:401b:ffff::ffff:ffff
    [Dec-08-2018 18:36:53:517][4574] 0x02 -> log_notice: Reporting Generic Notice type:3 num:67 (Mcast group deleted) from LID:1 GID:ff12:401b:ffff::ffff:ffff




    so i have to get a second computer with the card to try...

     
  23. Mr. Baz

    Mr. Baz 2[H]4U

    Messages:
    2,815
    Joined:
    Aug 17, 2001
    Using them with a hypervisor? I'm surprised you didn't opt for an Intel X520-DA2 and just be done with it. I mean, sure Mellanox cards can work very well, but not without a lot of work and not right out of the box.
     
  24. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005

    As said.. all parts given . Ive got like 10 cards and 10 cables and 4 switches. B4 xmas and buy stuff?????
     
  25. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017

    Mellanox cards themselves are plug and play for what they are. Windows even comes with drivers because these cards are so prevalent across industries. The issue is not with the brand, but the type of card. Mellanox Ethernet cards are plug them in and they function great with no special config, Mellanox Infiniband cards plug in and work fine on the NIC end, but Infiniband network in general requires the additional setup simply due to the nature of what Infiniband is like, it is not Ethernet and it is not designed for general "everything runs normal across this network" type of traffic.
     
    MrGuvernment and TeleFragger like this.
  26. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    yeah playing around is getting old.. I have the one in my win10 box and it gets a link light on the switch when cable is in and when opensm is launched, both lights light up...
    dropped one of these into a Lenovo P710 with esxi 6.7 and it sees it but I cant get a light when cable is plugged into the card... ive got too much work to do, to play around right now..

    but is there a secret on the esxi side that I'm missing?

    10g1.png


    10g2.png
     
  27. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
    mlx4_en is an Ethernet driver, when you want to be using an Infiniband driver (mlx4_ib). Your switch probably cannot bring a link up when it is trying to negotiate Ethernet
     
    TeleFragger likes this.
  28. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
  29. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    well tlhen with that.. since there is Ethernet driver, can I connect this cable from nic to nic on win10? what would be needed to do that if so? just connect cable and ... .?tried it and didn't get link up
     
  30. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017

    In Windows I believe the driver is an all-in-one type and includes both EN and IB. What you need to do is open device manager, go to hardware properties of the NIC, and select Ethernet. Then it will enable nic to nic communication over ethernet. You do not need a subnet manager for doing this, the subnet manager is just for infiniband traffic protocol. Set the IP of both nics in Windows to a different subnet as the main one you use to connect to the router. For example if your basic network is 192.168.1.x then just use 192.168.2.x for both nics. Set a folder to be shared, map it on the other Windows pc and your set.
     
    TeleFragger likes this.
  31. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    ive been going crazy on that too... in windows 10 I have the driver tab as you see way up in post 1.. so I don't have the pic I see others posting where you can select Ethernet or ipob.... trying to figure that out as well...
    running 5.50 windows driver...
     
  32. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
    I forgot to go get a screenshot of what the driver page should look like yesterday. Ill try to remember to do it when I get home today. I think you might be looking in the wrong driver place, that might be the issue.
     
    TeleFragger likes this.
  33. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    ok I'm gonna ditch all of this. cant figure out how to get it to work..
    I even put 2 cards in 1 machine and all said was well.. assigned 10.10.1.1 through 10.10.1.4 to all the nic cables... attached 4 cables to the switch just to not miss anything.... ran opensm and gave it time. I couldn't ping any of the ports... none... all cables still said unplugged, etc..

    its before xmas so I'm gonna have to wait but I'm still going to start looking for hardware. I see people doing point to point.. get 3 dual nic cards and go that route which should work for me..

    so I'm looking at connectx-2 and connectx-3 cards but I see EN and VPI ... since I am going to be using 2 in windows boxes and 1 in esxi 6.7 (to which what I saw mellanox said connectx-4 or 5 only for 6.7???)

    ive got $80 on amazon gift cards and is see a few cards I could get then ebay for a cable or 2... and try and start small nodes first...

    thoughts on what I should get to play around?


    my end goal is simple...
    server 2016 - houses family file shares of pics, movies, etc and I also do video/photo edits
    gamer - use to play games and edit videos, photos and transfer all to server 2016
    esxi 6.7 - used as a virtual lab but would love to do freenas and have vms stored there - more for a to do it type thing

    plex server / OTA dvr - files really kept on server 2016 so not sure I need 10g here
     
  34. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    well ill hold off from ditching my idea to see what you show...
     
  35. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
    ConnectX- # -EN = Ethernet cards
    VPI = Infiniband cards (capable of running Infiniband or Ethernet depending on mode they are in)



    I am wondering since you have ConnectX-1 cards, maybe they are not VPI models and do not support ethernet?
     
    Last edited: Dec 18, 2018
    TeleFragger likes this.
  36. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
  37. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
  38. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    I appreciate all your help going back and forth so please don't think I'm complaining..hah..

    I looked at the back of a card and it shows the 3rd one under HCA products
    HP IB 4X DDR Conn-X PCI-e G2 Dual Port HCA 483514-B21 487505-001 MHGH29-XTC

    so after more googling I found out that while under network adapters it shows Mellanox ConnectX IPIB adapter 1 and 2... I see farther down..

    under system devices it says
    Mellanox ConnectX VPI (MT26418) PCIe 2.0 5GT/s, IB DDR /10GigE Network Adapter...

    and all 3 are showing drivers installed.. and 5.50..
     
  39. TeleFragger

    TeleFragger Gawd

    Messages:
    787
    Joined:
    Nov 10, 2005
    so check this out.. playing around more.. I got farther...

    so I have HP cards and the fw is 2.8 ...


    and it is in IB mode...

    flint.PNG


    hca.PNG
     
  40. EniGmA1987

    EniGmA1987 Limp Gawd

    Messages:
    161
    Joined:
    May 2, 2017
    That is telling you that you downloaded mellanox technologies branded firmware, and it has HP branded firmware installed. Im not sure if it is safe to force a flash of the MT firmware or not, perhaps talk to either Mellanox or HP and see if you can get the right PSID branded firmware for your card.



    As for it being in Infiniband mode, that should read out differently once we figure out the Windows driver thing and get it switched there into Ethernet.
    However, I also know it is possible to switch modes within that command line tool you are using, I have done it before on ConnectX-3 cards.
     
    TeleFragger likes this.