Infiniband Unified Fabric Manager install question

derekcat

n00b
Joined
Sep 20, 2013
Messages
4
Hey there!
Not sure if anyone would know since this seems to be pretty specific, but I'm trying to get UFM setup, gotten through everything except getting the service started, but then I get this error:
[user@host] sudo service ufmd start
ufmd monitor
ufm: Not valid IP address for fabric interface ib0. Exit
As best I can tell, I'm not using IPoIB, and I can't find any option that would seem to switch between IP and pure IB modes in gv.cfg.

Any thoughts? Google had zero results for this error, the Mellanox docs don't seem to talk about it, so I'm at a bit of a loss..

Any help would be appreciated!
Thank you!
 
If someone here doesn't know, I think the guys over at teh servethehome forum would. There's a few users between here and there that are the same, so might get lucky here too. Yes, I do get around! :D
 
I don't think you can turn IPoIB off. What are you trying to run? Most people use IB for the RDMA protocol. That uses TCP/IP to set up connections. It makes a TCP connection to the remote host to negotiate the parameters of the RDMA connection. Do you have an IP address assigned to ib0?
 
SamirD: Thank you for the tip! I'll check over there too if we can't figure it out.

zandor: Hmmm.. Curious, the things I've read so far made it sound like you would have to switch between IPoIB or native Infiniband. But I'm very new to the Infiniband world, so I may have misunderstood.
Ahh interesting, that makes sense, though it looks like it it doesn't have to be on the ib0 interface - as the other servers I'm working with (compute nodes) don't have ib0 configured with an IP.
For this machine, all I want it to do is run UFM so I can better understand and manage the IB environment I'm working in.
 
I don't think you can turn IPoIB off. What are you trying to run? Most people use IB for the RDMA protocol. That uses TCP/IP to set up connections. It makes a TCP connection to the remote host to negotiate the parameters of the RDMA connection. Do you have an IP address assigned to ib0?
So after getting a reply via the Nvidia forum ( https://forums.developer.nvidia.com/t/cant-start-fresh-ufm-install/208643 ) and Mellanox support, I tried applying a garbage IP to ib0 (10.254.254.7), tried to start it again, and it worked!
So looks like you were correct and I should've applied "some" IP address to ib0 (which I configured in /etc/sysconfig/network-scripts/ifcfg-ib0 as datagram mode, AKA non-IP...).
On the IP configuration note, I used section 13.8.7 here: https://access.redhat.com/documenta...7/html/networking_guide/sec-configuring_ipoib
The file looks like:
```
BOOTPROTO=none
DEVICE=ib0
IPADDR=10.254.254.7
MTU=65520
NETWORK=10.254.254.0
BROADCAST=10.254.254.255
PREFIX=24
ONBOOT=yes
STARTMODE=auto
TYPE=InfiniBand
USERCTL=no
CONNECTED_MODE=no
```

Launched fine after that and I can access it via IP/ufm_web, and it's seeing a lot of IB switches and nodes! Not sure if anything is missing yet, but looking good so far!
Thank you again for the help!
 
  • Like
Reactions: xx0xx
like this
I'm glad you got it working. Looks like the "Not valid IP address" error message was to be taken literally.

Now that I think about it I'm not sure RDMA actually requires IP. I've never actually worked with the OFED API, just dealt with a bunch of apps that use it. All the RDMA apps I've used required IP to set up a connection, but that doesn't mean it's a requirement of the API. Perhaps connections could be set up using a different network or statically configured or...?
 
Haha apparently! Still very strange that it's a requirement.

I suspect you're correct, looking around our infrastructure with UFM, no switches have IPs defined, but all of the switches and servers connected via native InfiniBand are visible.
So as long as the software knew what InfiniBand node it wanted to talk with it should be able to?
 
Back
Top