infiniband performance?

sweloop64

Limp Gawd
Joined
Nov 28, 2009
Messages
202
IPoIB is as I have understood far from optimal due to its intensive CPU-usage...
Is this still an issue or is there efficient hardware offload or are CPUs fast enough these days?

As a little sidetrack, anyone running nfs-rdma on fairly up to date hardware and has any performance figures to share?
 
I haven't kept up with IB but why would you want to run IP over it? TCP/IP wasn't really designed for those speeds. Native Infiniband (or Myrinet which is what I set up a few times) would be so much more efficient...

Just curious what you're doing.
 
What kind of application is this? Looking at setting up a high-speed network with used IB gear (I know a few people around here have looked at that, I know xphil3 has an Infiniband switch, although I haven't heard if it was running.
 
I have a pair of machines that had PCI-E IPoIB 10GB host card adapters, and was hitting ~170-180MB/s with file copies in Windows. The 20Gb HCAs hit the same numbers. At that time I thought I was CPU bound with an E2180 in one machine. I have since updated the CPU, and newer drivers have been released, but haven't had time to reinstall the IB HCAs. Linux does give better performance due to RDMA, but you're probably limited by your disks write performance in most cases. With a pair of 10GBase-CX4 Ethernet NICs and the same machines, I was hitting ~250MB/s transfers for some time, but regularly saw ~190MB/s. I never bothered with ramdisks or any other nonsense, only needed it to backup one array to another, and my numbers were real world performance.

As far as trying out IB cheaply: All you need is a IB CX4 cable, a pair of HCAs, drivers, and a subnet manager. Normally one of the switches will act as a subnet manager, but you can just do that in software. Make sure you get real Infiniband cables, not cables meant for 4x sata or SAS, they have different ends. You should be able to get a pair of IB HCAs and a cable for around $100 on ebay if you can wait a week or so for a deal. When looking for a cable, keep in mind the best deals will be on the ~4-7M cables, which noone really needs. You'll pay a premium for short (0.5-1M) cables. Don't buy a used one that looks like its been tangled up, just move on to a good used one or a new one.

You can find deals on 10GBe adapters and cables on ebay, but they're few and far between. I was persistent and over a few weeks I was able to get 2 NICs and a pack of 5 cables cheap.
 
Last edited:
I haven't kept up with IB but why would you want to run IP over it? TCP/IP wasn't really designed for those speeds. Native Infiniband (or Myrinet which is what I set up a few times) would be so much more efficient...

Just curious what you're doing.
AFAIK there is a limited selection of programs that work at native IB level...
Myrinet might be better in some setups, but price is many times higher :(
What I'm doing? trying to find an acceptable replacement for GbE...

What kind of application is this? Looking at setting up a high-speed network with used IB gear (I know a few people around here have looked at that, I know xphil3 has an Infiniband switch, although I haven't heard if it was running.
nfs-rdma? it's nfs taking advantage of rdma, boosting performance many times in common setups...

I have a pair of machines that had PCI-E IPoIB 10GB host card adapters, and was hitting ~170-180MB/s with file copies in Windows. The 20Gb HCAs hit the same numbers. At that time I thought I was CPU bound with an E2180 in one machine. I have since updated the CPU, and newer drivers have been released, but haven't had time to reinstall the IB HCAs. Linux does give better performance due to RDMA, but you're probably limited by your disks write performance in most cases. With a pair of 10GBase-CX4 Ethernet NICs and the same machines, I was hitting ~250MB/s transfers for some time, but regularly saw ~190MB/s. I never bothered with ramdisks or any other nonsense, only needed it to backup one array to another, and my numbers were real world performance.

As far as trying out IB cheaply: All you need is a IB CX4 cable, a pair of HCAs, drivers, and a subnet manager. Normally one of the switches will act as a subnet manager, but you can just do that in software. Make sure you get real Infiniband cables, not cables meant for 4x sata or SAS, they have different ends. You should be able to get a pair of IB HCAs and a cable for around $100 on ebay if you can wait a week or so for a deal. When looking for a cable, keep in mind the best deals will be on the ~4-7M cables, which noone really needs. You'll pay a premium for short (0.5-1M) cables. Don't buy a used one that looks like its been tangled up, just move on to a good used one or a new one.

You can find deals on 10GBe adapters and cables on ebay, but they're few and far between. I was persistent and over a few weeks I was able to get 2 NICs and a pack of 5 cables cheap.
170MB/s... SMB1 or SMB2? windows to windows?
Performance with new CPU?
InfiniHost III Lx/Ex?
The two-adapters-no-switch setup is what I had in mind as switches cost a bit too much just for testing stuff...
Yes, 10GbE adapters at bargain prices are not that common, IB adapters on the other hand...

sounds like you need a few of these... 1.4GB/s sequential write speed...

http://www.newegg.com/Product/Product.aspx?Item=N82E16820227517
hmm... why?
 
170MB/s... SMB1 or SMB2? windows to windows?
Performance with new CPU?
InfiniHost III Lx/Ex?
The two-adapters-no-switch setup is what I had in mind as switches cost a bit too much just for testing stuff...
Yes, 10GbE adapters at bargain prices are not that common, IB adapters on the other hand..

SMB2, setup was a pair of 64bit windows Vista pro machines, haven't had any need/reason to upgrade those to Win7. Machines had large SATA Areca arrays (12x 1TB, 10x 1TB) and plenty of cache (2GB). Both machines were S775 based with an E2180 in the lower end one and an E6750. The E2180 was replaced, I'd have to check what is in it now. Both the HCAs and RAID cards had x8 slots. Both machines were using Intel S3210SHLC boards, and had ~4GB memory.

MHEA28-XTC. I'm almost positive that's what I used as I have 40-50 of them last check.
MHGA28-XTC cards for the 20gb DDR setup.

As far as the 10GBbase-CX4 cards, I had a NetXen card, as well as a Chelsio(maybe?). I'm not at home for a while, but I can check and get back in a week or so.

I did not get to try out any RDMA with linux, but RDMA is where the performance of IB is.
 
performance a bit lower than I would have expected :(
Connected Mode I presume? MTU 65k?

MHEA28-XTC is what one usually finds at around $50 on ebay...

Have any experience (or seen any real world stats) with the ConnectX or ConnectX-2 series that features TCP/UDP/IP stateless offload?
 
performance a bit lower than I would have expected :(
Connected Mode I presume? MTU 65k?

MHEA28-XTC is what one usually finds at around $50 on ebay...

Have any experience (or seen any real world stats) with the ConnectX or ConnectX-2 series that features TCP/UDP/IP stateless offload?

I didn't have anything that could use RDMA, so that is a big part of it I'm sure. Looking at my past posts, I'm almost positive I was running the 2.0.2 or the 2.1 OpenFabrics software, which was from Feb or Sept 2009. Newer versions have been released : 2.2, 2.2.1, 2.3rc5, 2.3rc6, which I haven't had experience with. The OpenFabrics roadmap shows some IPoIB fixes and features due out (DHCP, which I wasn't using in the next release, and IPoIB Connected Mode to be added in 2011).

I haven't owned any ConnectX cards.

I may have time to play around with this again next week, but I'll be busy most likely with holiday stuff until the week after. If you're looking to buy on ebay, find one of the sellers selling them for ~$40 or best offer and submit an offer for a pair of them. You should be able to get a good deal and combine shipping.
 
I have two PC's, each with RAID 0 capable of 600 MB/s sustained read rate with large (e.g. 10GB) files.
In effect, these two PCs form a RAID 10 system (mirrored RAID 0).

File transfers between the two systems easily saturate Gbit Ethernet.
Even with sustained (and verified) and saturated 120 MB/s, it takes about 25 minutes to copy 150GB between the two PC's.

However, I have been hoping that baseline Infiniband of 10 Gbit/s would cope with 600 MB/s RAID 0 transfers and drop the time to copy 150GB down to 4 minutes for example.
Sort of like the benefit when we all went from 100 Mbit to Gbit Ethernet.

I see reasonable price ($99) Infiniband cards available eBay:
Questions:
(1) Can a DIRECT cable connection be made between two Infiniband HCA's?
Sort of like using an Ethernet cross-over cable (since Infiniband switches are very expensive)

(2) How is infiniband set up in Win 7 or Linux?
Same as or similar to a network connection?
Does drag-and-drop of files work, or must command line be used?
 
Got a pair of MHEA28-XTC for less then $80, will do some testing within the next couple of days...
 
Questions:
(1) Can a DIRECT cable connection be made between two Infiniband HCA's?
Sort of like using an Ethernet cross-over cable (since Infiniband switches are very expensive)

Yes, so long as one of the clients is running a subnet manager application.

(2) How is infiniband set up in Win 7 or Linux?
Same as or similar to a network connection?
Does drag-and-drop of files work, or must command line be used?

On Windows getting the machines setup to drag and drop files would take this:

1) Install the hardware, go to http://www.openfabrics.org/ for the latest drivers and install them.
2) Run the subnet manager application one one of the clients to bring the fabric up.
3) Setup the IPoIB network connection as a static IP on both sides and share out a folder.
3) Hit the Start menu and type in \\10.10.10.1 or whatever your IP is and click on the folder.

Everything else will be transparent.

You can have the subnet manager run on startup on both machine and if one is already running, the second will go into a standby mode.
 
Hi all,
Thanks for all the good info in this thread so far.
I've got myself a couple of Mellanox MHEA28-XTC cards, and a 3m cable. I'm running two machines with ipoib directly connected (no switch), Ubuntu 10.10 (64-bit) with opensm on one end, Windows 7 (64-bit) on the other with OFED. Both machines pingable, configured IP's statically. Ubuntu machine is a 3GHz Athlon X2, Win7 machine is i7 950. So the setup is working.

Using 'iperf' with the stock settings (I havent tweaked anything yet), I was getting 0.8 gbps.
Using iperf with "-w 65535", that increased to 1.2gbps.

Finally, my question. Does anyone have some links or HOWTO's for optimising this setup for maximum throughput? I've about 10 drives in the linux box which I'm going to stripe/raid5 for performance.

Rgds,
Dave.
 
OK, Initially I wasn't able to run netperf on the linux box, but then I updated the firmware on both cards (5.1 up to 5.3), and then netperf was able to run ok.

root@raid:~# netperf -H 10.4.12.1
TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.4.12.1 (10.4.12.1) port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
87380 16384 16384 10.00 7239.95

That's about 7 gbps, if I'm reading it right.
As compared to the data through my gigabit ethernet:
bytes bytes bytes secs. 10^6bits/sec
8192 16384 16384 10.00 580.92

iperf still shows about 1.3gbps, though. I guess I'll have to configure the raid drives and see if I can get decent real-life throughputs with that.

Still, it's nice to have an Infiniband Fabric at home. Interesting (geeky) conversation piece. :)

Cheers,
Dave.

--EDIT--
Fyi, I found the easiest way to upgrade the firmware was to get the MFT from mellanox onto the Windows 7 box, Run a cmd prompt as administrator, then follow the Mellanox instructions from there. I has to provide the -skip_is parameter to the flint tool, which then worked fine, rebooted the machine and card came up with firmware 5.3.
That way I didn't have to go through the pain on compiling the firmware burner on Ubuntu.
 
Last edited:
When I used some of the bandwidth testing tools I didn't see anything as nice as the real world performance.

I have a couple IB switches and dozens of cards and cables I need to get rid of, I've just been lazy lol.
 
For those of you attempting infiniband drivers on Ubuntu 10.10, there's a very quick way to enable the infiniband fabric as it does not come up without a little help in the form of some config file changes and package installations. The blog article is here

Also, if anyone can give me some pointers as to why my speed is gone down to 25mbps, when my previous linux install was giving me 7200mbps. It's the same machine, Ubuntu box completely cleaned and ib0 brough up using the procedure in the blog article. remote machine has not changed.
I don't recall having to do any major tweaks to the previous install to get the speed up to 7gbos, apart from putting the link into connected mode, and increasing the mtu to 65520.

Thanks,
Dave.
 
Figured I'd post my findings today (using the latest firmware and drives available for the Mellanox MHEA28-XTC cards):

-Storage Server 2008 R2 SP1 x64 and Windows 7 SP1 x64 (server and client respectively).
-Server's RAID array gets ~800mb/sec reads and ~800mb/sec writes
-Client RAID array gets ~200mb/sec reads and ~200mb/sec writes
-iperf tests only got around 2gbps no where near 10gbps even after making sure the MTU was increased
-Was not able to stream (ie play) a 1920x1080 uncompressed 30 second clip of a video project I am working on smoothly from the server to the client. The bit rate for that is about 180mb/sec.
-1280x720 uncompressed 30 second clip however was able to playback smoothly, albeit that is only around 67mb/sec. Most of my clients are looking for 1080p not 720p, so this is unacceptable for my work.
-CPU usage on the server was between 60-70% during file transfers and 30-40% on the client. The server has a Athlon II X2 260 and the client has a Phenom II X6 1090T, both with 16gb of ram.

So overall I am upset with it, to eBay the cards and cable go....
 
It really sounds like the server's processor is a huge bottleneck. Infiniband should be leagues ahead of 2Gbps, that's what I get on FC.

Seriously, I would upgrade the CPU, then try it again imo.
 
hmm, maybe get a Phenom II X4 955 (3.2ghz)?

I only have Sempron 140s in my HTPC and wife's computer and Opterons in my ESXi server so nothing readily available to test.
 
Yeah, a dual-core may hold a 10Gbps infiniband connection back a bit. If you can afford it, definitely go for the processor you listed.
 
You're not going to get much better than that when encapsulating ethernet traffic over InfiniBand. If you want 10gbit speeds, get a 10gbit ethernet card.
 
Blue Fox -

Yeah I understand that would be the preferred method, I was just trying to explore other avenues (ie FC, Infiniband) before saving up for true 10gbit ethernet. I don't like spending a lot of money if I don't have to :)
 
Oh, I didn't realize you were trying to get ethernet traffic to go through the inifiniband. Yes, Blue Fox is correct, you're going to need a 10Gb ethernet card(s) if you want that speed over infiniband.
 
Cool thanks for the heads up, I'll continue doing what I've been doing then:

-Copy 1080p footage from my Compact Flash cards to my Workstation
-Edit, apply post production, burn to blu-ray/dvd
-Copy all of the work to the SAN for storage

For most projects I have plenty of space on my workstation, but the longer hour long projects take up TBs of space :)
 
Have you considered running remote desktop services (aka terminal server) on your SAN box and RDPing into your virtual "workstation"? You would need a much beefier CPU, but it would solve the network connectivity issue.
 
We have designed a 72 Bay SAN with 2 x LSI 9265 RAID controllers and a Mellanox QDR. There is a Supermicro IB Switch connected to Supermicro 8 Nos Blade servers each with 64GB and 120GB SSD to boot. Server blades are with QDR IB module and on Win 2008 Ent. and two of them are clustered for SQL DB. I would like to know ideal OS for the SAN and protocol to be used for optimal performance of the storage network throughput and throughput between the blades. IBoSRP, iSCSI extns for RDMA(iSER), NFSoRDMA, RoCE or SMB2. What will be the storage throughput we can expect out of the entire solution, 20GB++...Pls help
 
Thanks all for this discussion. As soon as QDR switches are still little bit expensive, I will start with direct-connect using just cards. This is not the only place where I see that the CPU performance plays crucial role in the overall speed result when using TCP over Infiniband.
 
Back
Top