NFS and async

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
Been fighting with NFS performance issues for a while and just found that if I set the async option in /etc/exports it might boost performance. How do I go about making it a global option? I don't want to have to specify it for every single host and every single share each time. Is there any disadvantage to async? I read that data could be corrupt if power goes out but that's with no UPS.
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
Been fighting with NFS performance issues for a while and just found that if I set the async option in /etc/exports it might boost performance. How do I go about making it a global option? I don't want to have to specify it for every single host and every single share each time. Is there any disadvantage to async? I read that data could be corrupt if power goes out but that's with no UPS.

If you're using Linux and NFS v2 or 3 then NFS exports are async by default. If your kernel has /proc baked in then you should be able to set 'sync' or 'async' in '/proc/fs/nfs/exports'
 

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
When I cat /proc/fs/nfs/exports I get all my exports but sync is the default option. it looks like I'd have to specify for each entry there too. I'm not sure what version of NFS I'm using, whatever is default in CentOS 6.5. Is there a way to check?
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
Oh and yes, there should be separate stanzas for each export (in blocks) so you'll have to set them all (or some or however you want) to 'sync' or 'async'. I suppose you could script it (if you've got lots of exports, with bash or Python) but that might take more time than doing each one manually and this is the sort of thing (personally) that I'd do manually to ensure it's done properly.
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
What I ended up doing was building an entirely separate pool (because I use FreeBSD/ZFS) with SSDs for SAN storage. This pool contains a single dataset (with atime turned off and sync on) that is shared and exported via NFS over 1m direct-attach 10GbE to my ESXi box.
 

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
nfsstat -m does not do anything, no error or text either. So there's no way to make async global? I guess I can look at scripting it and having my own custom config file. I never liked the idea that you have to set the option on a per host basis either, so I can probably fix that if I write an app that reads a custom config file and generates the nfs one.
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
Hmmm well you might be able to see the version by doing 'grep nfs /proc/mounts' or 'rpcinfo -p localhost'

And no, I don't believe so... But hey I've been wrong before. If I'm not mistaken NFS exports are case-by-case so settings are outlined per block or stanza (similar but different than smb). Seems kinda silly to me because smb can have a global config and then more granular control per share in each respective stanza.
 
Last edited:

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
Playing around it seems I have v3 and v4 installed at same time? Here is result of nfsstat -s

Code:
[root@isengard ~]# nfsstat -s
Server rpc stats:
calls  badcalls  badclnt  badauth  xdrcall
58354882  0  0  0  0 

Server nfs v3:
null  getattr  setattr  lookup  access  readlink 
949  0% 45675056  1% 1015842  0% 30674625  0% 24322821  0% 3115  0%
read  write  create  mkdir  symlink  mknod 
1724777136 48% 1702942303 47% 3735159  0% 72789  0% 56  0% 0  0%
remove  rmdir  rename  link  readdir  readdirplus
3479734  0% 32437  0% 321503  0% 28086  0% 5165  0% 1660786  0%
fsstat  fsinfo  pathconf  commit 
4681820  0% 1592  0% 764  0% 7395521  0%

Server nfs v4:
null  compound 
127  0% 802043329 99%

Server nfs v4 operations:
op0-unused  op1-unused  op2-future  access  close  commit 
0  0% 0  0% 0  0% 11485062  0% 1117251  0% 313954  0%
create  delegpurge  delegreturn  getattr  getfh  link 
1431  0% 0  0% 1072221  0% 207562118 12% 10671917  0% 0  0%
lock  lockt  locku  lookup  lookup_root  nverify 
555  0% 0  0% 554  0% 9032368  0% 0  0% 0  0%
open  openattr  open_conf  open_dgrd  putfh  putpubfh 
1795445  0% 0  0% 491  0% 300  0% 799837301 48% 0  0%
putrootfh  read  readdir  readlink  remove  rename 
543  0% 583352485 35% 473742  0% 1851  0% 2941  0% 3405  0%
renew  restorefh  savefh  secinfo  setattr  setcltid 
1949457  0% 129096  0% 132512  0% 36  0% 26999  0% 203  0%
setcltidconf verify  write  rellockowner bc_ctl  bind_conn 
203  0% 0  0% 25543294  1% 554  0% 0  0% 0  0%
exchange_id  create_ses  destroy_ses  free_stateid getdirdeleg  getdevinfo 
0  0% 0  0% 0  0% 0  0% 0  0% 0  0%
getdevlist  layoutcommit layoutget  layoutreturn secinfononam sequence 
0  0% 0  0% 0  0% 0  0% 0  0% 0  0%
set_ssv  test_stateid want_deleg  destroy_clid reclaim_comp
0  0% 0  0% 0  0% 0  0% 0  0%

[root@isengard ~]#

That actual info is hard to read given the terrible text layout though. I don't know why lot of Linux commands don't format the output in a more readable format.


Oh and here is output of the commands you said:

Code:
[root@isengard ~]# rpcinfo -p localhost
  program vers proto  port  service
  100000  4  tcp  111  portmapper
  100000  3  tcp  111  portmapper
  100000  2  tcp  111  portmapper
  100000  4  udp  111  portmapper
  100000  3  udp  111  portmapper
  100000  2  udp  111  portmapper
  100011  1  udp  875  rquotad
  100011  2  udp  875  rquotad
  100011  1  tcp  875  rquotad
  100011  2  tcp  875  rquotad
  100005  1  udp  60704  mountd
  100005  1  tcp  36424  mountd
  100005  2  udp  48241  mountd
  100005  2  tcp  43839  mountd
  100005  3  udp  56315  mountd
  100005  3  tcp  55864  mountd
  100003  2  tcp  2049  nfs
  100003  3  tcp  2049  nfs
  100003  4  tcp  2049  nfs
  100227  2  tcp  2049  nfs_acl
  100227  3  tcp  2049  nfs_acl
  100003  2  udp  2049  nfs
  100003  3  udp  2049  nfs
  100003  4  udp  2049  nfs
  100227  2  udp  2049  nfs_acl
  100227  3  udp  2049  nfs_acl
  100021  1  udp  43151  nlockmgr
  100021  3  udp  43151  nlockmgr
  100021  4  udp  43151  nlockmgr
  100021  1  tcp  56664  nlockmgr
  100021  3  tcp  56664  nlockmgr
  100021  4  tcp  56664  nlockmgr
  100024  1  udp  54864  status
  100024  1  tcp  34533  status
[root@isengard ~]#
[root@isengard ~]#
[root@isengard ~]#
[root@isengard ~]# grep nfs /proc/mounts
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
[root@isengard ~]#
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
That's interesting... I'm seeing NFS v2, 3 & 4 but I'm not sure how exactly just yet or why. I'll sleep on it and do some sleuthing tomorrow.
 

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
This is a rather default install too, as in, everything installed from repositories, nothing weird as far as I can recall. I was also mistaken it's actually Cent OS 6.7 not 6.5. My kernel is 2.6.32-358 which is probably behind. Uptime is 653 days. I am due for a reboot, but this is not only a file server but a VM store as well, so it's kinda a huge ordeal to reboot this box. This box rebooting basically involves literally rebooting my entire network. One of these days I need to look into some kind of redundant storage as it's kinda bad to have all my eggs in one basket like that.

I plan to add a dual or quad port nic in the server at some point and remove a fibre channel HBA that is no longer used and I also want to recable the back of my rack so I'll probably save a day to do all that stuff at once so I only need to reboot once. Make for a nice project some time next winter or something. :p Should probably throw in more ram for good measure while I'm in there. If I recall there's only 8GB. (is that an issue? It's dedicated to storage only, and using md raid so I don't imagine it needs lot of ram)
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
... Is there any disadvantage to async? I read that data could be corrupt if power goes out but that's with no UPS.

async is a lot fast than sync.

async is not good for writing corruption on NFS server. ex. server crashes or crippled

I would not use async when nfs server is serving mostly for writing on the server.
and
would use async when nfs server mostly serving read-only files/data.

just my suggestion . make sure you are using good UPS :D.

my simple suggestion...
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
That's interesting... I'm seeing NFS v2, 3 & 4 but I'm not sure how exactly just yet or why. I'll sleep on it and do some sleuthing tomorrow.

on my understanding:
nfs server is serving client v3 and v4 :).....

the nfs server is v4 indeed
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
Aha! Its a compatibility layer thing. Its showing three different versions in rpcinfo because v3/4 are able to communicate with clients running an older version. So while yes, it does indeed appear that the server is running v4 (or 3 maybe) its also able to speak with v2.
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
async is a lot fast than sync.

async is not good for writing corruption on NFS server. ex. server crashes or crippled

I would not use async when nfs server is serving mostly for writing on the server.
and
would use async when nfs server mostly serving read-only files/data.

just my suggestion . make sure you are using good UPS :D.

my simple suggestion...

async is much faster than sync true, however this is only true for writes. sync is default in most cases due to the developers making a safety precaution decision for the users. If the user knows different or has a setup that has a UPS then it is advisable to set the async option as you would retain high reads and gain high writes (safely).
 

Phantum

[H]ard|Gawd
Joined
Jul 25, 2001
Messages
1,716
This is a rather default install too, as in, everything installed from repositories, nothing weird as far as I can recall. I was also mistaken it's actually Cent OS 6.7 not 6.5. My kernel is 2.6.32-358 which is probably behind. Uptime is 653 days. I am due for a reboot, but this is not only a file server but a VM store as well, so it's kinda a huge ordeal to reboot this box. This box rebooting basically involves literally rebooting my entire network. One of these days I need to look into some kind of redundant storage as it's kinda bad to have all my eggs in one basket like that.

I plan to add a dual or quad port nic in the server at some point and remove a fibre channel HBA that is no longer used and I also want to recable the back of my rack so I'll probably save a day to do all that stuff at once so I only need to reboot once. Make for a nice project some time next winter or something. :p Should probably throw in more ram for good measure while I'm in there. If I recall there's only 8GB. (is that an issue? It's dedicated to storage only, and using md raid so I don't imagine it needs lot of ram)

Jeez man! Don't get me wrong, that's one hell of an uptime and the system is certainly rock solid stable but kernels are 4+ now. I'm the same boat in that taking my SAN storage down means shutting down most of my network but the core runs on either bare metal or has dedicated local storage for this exact case. Also CentOS latest version is 7.2 so maybe upgrade in-place when you get the chance. Not super critical but just saying. And no the RAM should be fine. Linux with mdadm on extX isn't nearly as memory hungry as ZFS.
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
Aha! Its a compatibility layer thing. Its showing three different versions in rpcinfo because v3/4 are able to communicate with clients running an older version. So while yes, it does indeed appear that the server is running v4 (or 3 maybe) its also able to speak with v2.
this should be for compatiblity, V4 can server v4/3/2 nfs client too...


async is much faster than sync true, however this is only true for writes. sync is default in most cases due to the developers making a safety precaution decision for the users. If the user knows different or has a setup that has a UPS then it is advisable to set the async option as you would retain high reads and gain high writes (safely).

since OP talked on writing NFS server , I made assumption on writing to nfs server :).
 

Darakian

Supreme [H]ardness
Joined
Apr 12, 2004
Messages
4,698
I've never used ceph (though I have heard good things). The reason I bring it up is that the OP seems to want performance and the best way to get it is to go clustered.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,068
NFS + Sync write can slow down write performance on a slow disk array down to 10% of async values.
If you need fast NFS with a secure sync write behaviour, use ZFS with a dedicated Slog.

This gives you fast async writes to a regular disk based data pool over the ZFS write cache and secure sync write logging to a dedicated device that is optimized for low latency and high write iops like a Dram based ZeusRAM, an NVMe like a Intel P7500/3600/3700 or an Sata SSD like an Intel S3700.

If you use NVMe or SSD only storage pools on Solaris/ OmniOS (where ZFS and NFS comes from) performance degration is quite minimal without an additional Slog device.
 

Red Squirrel

[H]F Junkie
Joined
Nov 29, 2009
Messages
9,211
I actually looked at Gluster but it just sounds way too complicated, there's too many different parts to it and it all seems like a nightmare to setup and troubleshoot if something goes wrong. Maybe one of these days I'll play with it though, then write some kind of wrapper program to make it easier to manage. I want to do the same with kvm/qemu actually. My goal is to eventually switch to kvm instead of vmware, I just went vmware because it was fast and easy to setup. Setting up vlans in kvm did not look too easy, and it actually looked like making any changes required restarting the network service, which is not exactly something you want to be doing on a production server where the storage depends on the network being up.

So I guess to get back to OP there's no way to do async globally then, so I'll just write a script to generate the config files for me and just specify it for each block. And yes this server is both for reading and writing as it's pretty much the core of all my storage. I have about 19TB or so. I definitely will try to do a reboot at some point though, as that might solve a lot of issues too by updating kernel. Did not realize they were at 4.x now, I'm REALLY behind lol. I have 4 hour of battery backup for my whole rack and it's rare we get outages that last more than an hour. We do get the occasional 3-4 hour one due to construction though, changing lines to new poles etc. My area is a perpetual construction zone in summer.
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
NFS + Sync write can slow down write performance on a slow disk array down to 10% of async values.
If you need fast NFS with a secure sync write behaviour, use ZFS with a dedicated Slog.

This gives you fast async writes to a regular disk based data pool over the ZFS write cache and secure sync write logging to a dedicated device that is optimized for low latency and high write iops like a Dram based ZeusRAM, an NVMe like a Intel P7500/3600/3700 or an Sata SSD like an Intel S3700.

If you use NVMe or SSD only storage pools on Solaris/ OmniOS (where ZFS and NFS comes from) performance degration is quite minimal without an additional Slog device.

if you are running nfs v4 and client v4, you will get better performace..
I think, this thread not about ZFS :p...
btw I am using ZoL and NFS V4 server and all clients :D...

stay away with < V4...
2010 presentation -> https://www.redhat.com/promo/summit...-weeds/thurs/sdickson-1020-nfs/Summit2010.pdf

most features that I love on V4
Open and Close Operations
NT style ACLs support, well useful on some clients.
can be changed to secured NFS . (perfromance will less than non-secure).

the presenstation has Gluster FS too. I prefer ceph :D.

on linux, stay away V2 and V3 as possible.
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
I actually looked at Gluster but it just sounds way too complicated, there's too many different parts to it and it all seems like a nightmare to setup and troubleshoot if something goes wrong. Maybe one of these days I'll play with it though, then write some kind of wrapper program to make it easier to manage. I want to do the same with kvm/qemu actually. My goal is to eventually switch to kvm instead of vmware, I just went vmware because it was fast and easy to setup. Setting up vlans in kvm did not look too easy, and it actually looked like making any changes required restarting the network service, which is not exactly something you want to be doing on a production server where the storage depends on the network being up.

So I guess to get back to OP there's no way to do async globally then, so I'll just write a script to generate the config files for me and just specify it for each block. And yes this server is both for reading and writing as it's pretty much the core of all my storage. I have about 19TB or so. I definitely will try to do a reboot at some point though, as that might solve a lot of issues too by updating kernel. Did not realize they were at 4.x now, I'm REALLY behind lol. I have 4 hour of battery backup for my whole rack and it's rare we get outages that last more than an hour. We do get the occasional 3-4 hour one due to construction though, changing lines to new poles etc. My area is a perpetual construction zone in summer.


a bit OOT:
gluster and ceph are complecated :D. there is no way to make very simple due on involving many configuration.
once the system is running, you will be happy with the performance aka clustering!!!.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,068
if you are running nfs v4 and client v4, you will get better performace..
I think, this thread not about ZFS :p...
btw I am using ZoL and NFS V4 server and all clients :D...

Sync write is slow because every single commited write must be written to stable disks until the next write can occur - does not matter if you use iSCSI, SMB, NFS3 or NFS4. This is where ZFS and its idea of a dedicated high performance Slog device can improve performance with sync enabled dramatically over the values that you can get directly from your disks.
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
Sync write is slow because every single commited write must be written to stable disks until the next write can occur - does not matter if you use iSCSI, SMB, NFS3 or NFS4. This is where ZFS and its idea of a dedicated high performance Slog device can improve performance with sync enabled dramatically over the values that you can get directly from your disks.

NFS4 has better performance on writing, dare to see the presentation and try linux NFSV4 with V4 client?. you can not compared with ZFS that totaly different purposes.

I am trying to be objective on this thread.
I can push to clustering. clustering "gluster" and "ceph" is very fast and reliable, ZFS alone can not compete with on the shelf linux clustering, But this is not the objectivity on this thread.
side note: proxmox+ZFS(ZoL)+ceph is a killing machines for example in general rule.

ZFS versus NFS is not comparable. in general.

this is good promoting ZFS, but please stay in the objectivity.

I love ZFS (ZoL) due on BTRFS not mature enough on RAID6/RAIDZ2 alike, and love to put detail that based on the roles.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,068
This is not a ZFS vs NFS item and ZFS itself is absolutely not involved in performance problems with NFS sync writes.
Sync write is more a problem around data security when using a write cache for better performance and low iops of
disks and arrays. The only aspect is the concept of a ZIL/Slog device that is part of ZFS that was only developped to
improve sync write performance.

ZFS is currently quite unique with the approach of using a dedicated logdevice for sync writes.
NFS4,Clustering, Ceph or other technologies can improve overall performance but there is absolutely no relation to sync
write problems.

If you want to improve a low write performance with sync writes, you have only these options
- disable sync (reduced data security)
- enable sync with a "secure write cache or logging device" like ZFS with a ZIL or a hardware raid with a cache+BBU but this is not as save or fast
- use very fast disks like NVMe or fast SSDs that are fast even with sync enabled.
 
Last edited:

Darakian

Supreme [H]ardness
Joined
Apr 12, 2004
Messages
4,698
I actually looked at Gluster but it just sounds way too complicated, there's too many different parts to it and it all seems like a nightmare to setup and troubleshoot if something goes wrong. Maybe one of these days I'll play with it though, then write some kind of wrapper program to make it easier to manage. I want to do the same with kvm/qemu actually. My goal is to eventually switch to kvm instead of vmware, I just went vmware because it was fast and easy to setup. Setting up vlans in kvm did not look too easy, and it actually looked like making any changes required restarting the network service, which is not exactly something you want to be doing on a production server where the storage depends on the network being up.
You should try it. It's not very hard at all.
Install it
give it a directory (brick) to use
one one system create a volume
join other system to volume
mount
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
This is not a ZFS vs NFS item and ZFS itself is absolutely not involved in performance problems with NFS sync writes.
Sync write is more a problem around data security when using a write cache for better performance and low iops of
disks and arrays. The only aspect is the concept of a ZIL/Slog device that is part of ZFS that was only developped to
improve sync write performance.

ZFS is currently quite unique with the approach of using a dedicated logdevice for sync writes.
NFS4,Clustering, Ceph or other technologies can improve overall performance but there is absolutely no relation to sync
write problems.

If you want to improve a low write performance with sync writes, you have only these options
- disable sync (reduced data security)
- enable sync with a "secure write cache or logging device" like ZFS with a ZIL or a hardware raid with a cache+BBU but this is not as save or fast
- use very fast disks like NVMe or fast SSDs that are fast even with sync enabled.

nfs4 on linux has better performance.. try for yourself....
but ZFS with ZIL is a lot better.
nfs3 <nfs4 on writing performance with sync enabled
 

cantalup

Gawd
Joined
Feb 8, 2012
Messages
758
You should try it. It's not very hard at all.
Install it
give it a directory (brick) to use
one one system create a volume
join other system to volume
mount

totally true when you know what are you doing :)

a bit hard in the beginning... and easy when knowing better
gluster or ceph :)

I am using ceph due on proxmox prefer ceph..
 

Darakian

Supreme [H]ardness
Joined
Apr 12, 2004
Messages
4,698
I've heard a lot of good things about ceph, but never used it personally (thus why I'm not talking about it), but glusterfs is much easier than NFS ever was (IMO I guess).
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,068
NFS is easy as there are more or less no more options than put it on or off for a filesystem (Solaris ZFS) or any folder (other OS) to give shared storage access. Due to its simplicity it is quite in par regarding performance with blockbased accessmethods like FC/iSCSI.

If its too slow than you have problems on disk io especially when you enable a secure write behaviour (this is not related to any higher level software layer, this is on disk driver level) where you must disable all write caching options that gives performance. But this is the same when you switch to any other sort of sharing with or without clustering.

Clustering or multipath IO is a way to increase performance or availability on the access path level. This is not related to a file service protocol (ok, SMB3 can do). NFS becomes complicated only if you want to do something that it is not build for like authentication/authorisation in v3 or in v4 in combination with Windows and ACL or if you want multipath on NFS level (you can do on ip or blockbased level)

Clustering or high availability can increase performance but I would never combine that with the word easy when comparing simple NFS. The basic sync write performance problem is there as well as this is a problem on a level where you write a single datablock to a disk or array not at a sharing or clustering level.

Sync write is secure but slow write, does not matter who or what is requesting a write. You can only address this with faster cached writes where you care about cache on a powerfailure or very fast disks without a write cache that are fast enough for your needs (With a write cache in RAM they will be faster as well)
 
Top