ZFS / NFS - Achieving 1Gbps tput

packetboy · Apr 17, 2011

I read so many posts where people bad-mouth NFS because they can't seem to get expected throughput with it. I've been doing hardcore protocol analysis for almost 20 years now. That being said, I've done very little NFS analysis and heck I don't even feel comfortable with all the ins and outs of 1000BaseT. My suspicion has been though, that folks having NFS performance problems have not taken the time to understand all the variables at play and ensure they have things set optimally.

So today I setup to see exactly what NFS file copy performance I could get between a new Ubuntu box I was setting up and one of my ZFS servers....here's the network layout:

Ubuntu (10.10)
| -Cat 6 cable
Linksys SRW2024 Switch
| - Cat 6 cable
ZFS (Solaris 10 u9)

I started this exercise because when I did a simple dd file test I wasn't getting anywhere near the 100MB/s (1Gbps I was expecting):

Code:

root@kvm330:~# dd if=/mnt/cytel/temporary/llb_sda.dd of=/dev/null
^C22057633+0 records in
22057632+0 records out
11293507584 bytes (11 GB) copied, 169.848 s, 66.5 MB/s

So then I started checking things....Initial configuration of Ubuntu box:

Code:

root@kvm330:~# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:e0:81:4b:b8:ea  
          inet addr:192.168.2.110  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:fe4b:b8ea/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:5651497 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1708607 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:8530745609 (8.5 GB)  TX bytes:123711656 (123.7 MB)
          Interrupt:26 


root@kvm330:~# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp-segmentation-offload: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
ntuple-filters: off
receive-hashing: off

root@kvm330:~# ethtool -a eth1
Pause parameters for eth1:
Autonegotiate:	on
RX:		on
TX:		on

root@kvm330:~# ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511
Current hardware settings:
RX:		200
RX Mini:	0
RX Jumbo:	0
TX:		511

The glaring issue was that TX and RX flow control were enabled yet I had the switch port it was connected to *disabled*. If you research this setting you find folks making a case that it's really better just to let packets drop and then let TCP retransmit as necessary...typically recovering from congesting will occur faster with TCP than with Ethernet level flow control. I'd like to understand this better but for now I'd opted to disable it.

So here I am disabling TX and RX flow control on the Ubuntu eth1:

Code:

root@kvm330:~# ethtool -A eth1 autoneg off rx off tx off 
root@kvm330:~# 

root@kvm330:~# ethtool -a eth1
Pause parameters for eth1:
Autonegotiate:	off
RX:		off
TX:		off

And here's another test...I'm switching to a different file name to prevent file caching.

Code:

root@kvm330:~# dd if=/mnt/cytel/temporary/ymco_freebsd.img of=/dev/null
^C14314145+0 records in
14314144+0 records out
7328841728 bytes (7.3 GB) copied, 83.4662 s, 87.8 MB/s

So how's that for a 30% improvement with one simple (but important) adapter setting.

So next notice how jumbo frames ARE enabled on the ZFS server but are NOT enabled on the Ubuntu server:

Code:

# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
	inet 127.0.0.1 netmask ff000000 
e1000g0: flags=1001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,FIXEDMTU> mtu 9000 index 2
	inet 192.168.2.3 netmask ffffff00 broadcast 192.168.2.255
	ether 0:30:48:dc:e0:6a 

see above ifconfig for Ubuntu MTU.

So now let's enable jumbos on the Ubuntu server:

Code:

root@kvm330:~# ifconfig eth1 mtu 9000

root@kvm330:~# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:e0:81:4b:b8:ea  
          inet addr:192.168.2.110  Bcast:192.168.2.255  Mask:255.255.255.0
          inet6 addr: fe80::2e0:81ff:fe4b:b8ea/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
          RX packets:2187784 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1109459 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:19320371115 (19.3 GB)  TX bytes:82271651 (82.2 MB)
          Interrupt:26 

Hey..let's run snoop on the ZFS server during the test to actually make sure we are using jumbos...yup...note length 8948:


# snoop -r -d e1000g0  -c 10 ether host 00:E0:81:4B:B8:EA
Using device e1000g0 (promiscuous mode)
 192.168.2.3 -> 192.168.2.110 RPC R XID=3047194340 Success
192.168.2.110 -> 192.168.2.3  TCP D=2049 S=978 Ack=990785284 Seq=1341721328 Len=0 Win=24576 Options=<nop,nop,tstamp 236608 106894330>
 192.168.2.3 -> 192.168.2.110 TCP D=978 S=2049 Ack=1341721328 Seq=990776336 Len=8948 Win=53688 Options=<nop,nop,tstamp 106894330 236587>
 192.168.2.3 -> 192.168.2.110 TCP D=978 S=2049 Ack=1341721328 Seq=990785284 Len=8948 Win=53688 Options=<nop,nop,tstamp 106894330 236587>
192.168.2.110 -> 192.168.2.3  TCP D=2049 S=978 Ack=990803180 Seq=1341721328 Len=0 Win=24576 Options=<nop,nop,tstamp 236608 106894330>
 192.168.2.3 -> 192.168.2.110 TCP D=978 S=2049 Ack=1341721328 Seq=990794232 Len=8948 Win=53688 Options=<nop,nop,tstamp 106894330 236587>
 192.168.2.3 -> 192.168.2.110 TCP D=978 S=2049 Ack=1341721328 Seq=990803180 Len=8948 Win=53688 Options=<nop,nop,tstamp 106894330 236587>
192.168.2.110 -> 192.168.2.3  TCP D=2049 S=978 Ack=990821076 Seq=1341721328 Len=0 Win=24576 Options=<nop,nop,tstamp 236609 106894330>
 192.168.2.3 -> 192.168.2.110 TCP D=978 S=2049 Ack=1341721328 Seq=990812128 Len=8948 Win=53688 Options=<nop,nop,tstamp 106894330 236587>
 192.168.2.3 -> 192.168.2.110 TCP D=978 S=2049 Ack=1341721328 Seq=990821076 Len=8948 Win=53688 Options=<nop,nop,tstamp 106894330 236587>

And the result:

Code:

root@kvm330:~# dd if=/mnt/cytel/temporary/ymco_freebsd.img of=/dev/null
^C4029089+0 records in
4029088+0 records out
2062893056 bytes (2.1 GB) copied, 19.2819 s, 107 MB/s

Nice!

olavgg · Apr 18, 2011

For optimal network speed, use high quality NICs. I have good experience with Intel. I can easily achieve over 100MB without jumbo frames with Intel NICs. Jumbo frames helps reducing the cpu load and will give you a slightly performance increase for large data transfers, but it is slower for a lot of small data(for example a database that read/writes less than 8k chunks).
Another thing that is important when you're using jumbo frames, is to make sure that all devices that are using jumbo frames are running in their own seperate vlan.

ZFS / NFS - Achieving 1Gbps tput

packetboy

Limp Gawd

olavgg

Limp Gawd