OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Thanks _Gea for tips, I will try that.

It seems, that this problem is only on pools originaly created on Nexenta and later imported to OI. But even if I made zpool upgrade, I still cannot remove ZIL device.

I will try disk buffer, thanks.
 
Hmm, clearing buffer does not help.

And I'm not able to reproduce problem on another server (I created pool in Nexenta, made import to OI and after that ZIL device remove was OK, no problem). So I'm not sure if that is related to system change.

If anyone have some idea how to solve this problem except destroy and recreate pool please help me :)
Maybe I will try export and import pool to Nexenta or OmniOS, but it will be complicated on production storage server.

Thanks.


I have never had or heard of such problems with current OI
Do you have plain OI or napp-it (where you optionally use disk buffering for better performance with many disks,
delete buffer with menu disks - delete disk buffer, needed if you did not remove ZIL with menu disks - remove)

What you can also try: reboot or check zpool status at cli
 
Having issues setting ACL thru windows as root.

I browse to the share \\servernick\sharename

security - edit - add - advanced - find now - and select the local OI user I wish to add.

I get access denied when hitting ok. I connect to the share as root.

Trying to make certain users read only on specific directories.

What am I missing here?
 
Last edited:
I am testing napp-it on latest stable OmniOS on an old HP ML360 G5 (Broadcom GB Nics. 12GB RAM) with 16 SAS hard drives for both datastore for esxi 5 and backup repository for Veeam 6.5. This is going to be put in a co-location for Disaster Recovery.

2 x ESXi boxes (with nfs mounts)
1 x nappit + omnios (serving nfs and iscsi)

I installed OmniOS on an 8GB USB.

When I am testing LAN replication with Veeam from another esxi to my test esxi box with the OmniOS NFS datastore, the NFS datastore kept getting disconnected (and thus my replication failed) after a good amount of data transfer and it seems that the NFS box cannot handle the high throughput or something. Some replication succeeded while some not. It's unpredictable or inconsistent.

What can possibly go wrong? On nappit GUI, I click on System > Log but it's not showing any info now. It was showing log messages a couple of days ago, but not anymore.

The Veeam 6.5 box is a VM using vmxnet nic. I'm going to try the E1000 and see if it helps.
 
Last edited:
I am testing napp-it on latest stable OmniOS on an old HP ML360 G5 (Broadcom GB Nics. 12GB RAM) with 16 SAS hard drives for both datastore for esxi 5 and backup repository for Veeam 6.5. This is going to be put in a co-location for Disaster Recovery.

2 x ESXi boxes (with nfs mounts)
1 x nappit + omnios (serving nfs and iscsi)

I installed OmniOS on an 8GB USB.

When I am testing LAN replication with Veeam from another esxi to my test esxi box with the OmniOS NFS datastore, the NFS datastore kept getting disconnected (and thus my replication failed) after a good amount of data transfer and it seems that the NFS box cannot handle the high throughput or something. Some replication succeeded while some not. It's unpredictable or inconsistent.

What can possibly go wrong? On nappit GUI, I click on System > Log but it's not showing any info now. It was showing log messages a couple of days ago, but not anymore.

The Veeam 6.5 box is a VM using vmxnet nic. I'm going to try the E1000 and see if it helps.

This sounds very similar to the issue I have. Except mine is in an ALL-in-One setup and yet the SAN vm will become disconeccted with an All-States-Down error in ESXi. It will run fine for a day or so then start throwing those errors every so often, usually ESXi reconnects but during that time all VMs of course halt.

I have tried using both vmxnet3 nic and e1000 nics on the omnios. Tried several different network tuning tips to solve the issue. Tried out of the box settings of course too. Put both SAN and ESXi IPs in both machines hosts file to eliminate a possible DNS issue.
I have rebuilt the ESXi machine and OmniOS vm. No solution.
 
This sounds very similar to the issue I have. Except mine is in an ALL-in-One setup and yet the SAN vm will become disconeccted with an All-States-Down error in ESXi. It will run fine for a day or so then start throwing those errors every so often, usually ESXi reconnects but during that time all VMs of course halt.

I have tried using both vmxnet3 nic and e1000 nics on the omnios. Tried several different network tuning tips to solve the issue. Tried out of the box settings of course too. Put both SAN and ESXi IPs in both machines hosts file to eliminate a possible DNS issue.
I have rebuilt the ESXi machine and OmniOS vm. No solution.

I want to add that i am using both nfs (for esxi datastore) and iscsi (windows 2008 r2 VM iscsi initiator for veeam repository).

I tried E1000 Nic in the veeam VM but the throughput is slow, about 20MB/s. I think there's something up with networking where it can't handle the high bandwidth with vmxnet where I was maxing out the 1GB bandwidth with transfer speed of 100MB/s.

So, I am thinking about just sticking to one protocol and use iscsi. Let me know what you find out. thanks! I'll report back later.
 
With ESXi I have only been using NFS.

From some research, there might be some NFS settings that need to be tweaked in ESXi Advanced Setting. Use the Oracle's zfs-vsphere-nfs best practice settings in the pdf below. I'll try and report back.

Go to page 22 and try some of the recommended settings.
http://www.oracle.com/technetwork/s...mentation/bestprac-zfssa-vsphere5-1940129.pdf

At the bottom of this page, there is also recommended setting for ESXi and NFS. I know it's old, but just use as reference.
http://communities.vmware.com/thread/197850?start=0&tstart=0
 
From some research, there might be some NFS settings that need to be tweaked in ESXi Advanced Setting. Use the Oracle's zfs-vsphere-nfs best practice settings in the pdf below. I'll try and report back.

Go to page 22 and try some of the recommended settings.
http://www.oracle.com/technetwork/s...mentation/bestprac-zfssa-vsphere5-1940129.pdf

At the bottom of this page, there is also recommended setting for ESXi and NFS. I know it's old, but just use as reference.
http://communities.vmware.com/thread/197850?start=0&tstart=0

Thanks, yep I have set all my Esxi settings per the oracle doc, they did not help :(
 
Anyone try enabling lz4 zfs compression in OpenIndiana, or have any idea when it will appear in an update?

http://wiki.illumos.org/display/illumos/LZ4+Compression

also, vmware has an article about nfs disconnect using Netapp and they suggested setting the NFS maxqueuedepth to 64 from a very high default. But I think this will limit your IO. I haven't tried it though.

http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2016122

I also read somewhere that if you have one of the supermicro motherboards with the built-in Intel Nic ending with LM, I think, then it's incompatible with esxi 5 but you are using all-in-one.

If I can't find a fix, i'm going to going to test and use iscsi instead.
 
also, vmware has an article about nfs disconnect using Netapp and they suggested setting the NFS maxqueuedepth to 64 from a very high default. But I think this will limit your IO. I haven't tried it though.

http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2016122

I also read somewhere that if you have one of the supermicro motherboards with the built-in Intel Nic ending with LM, I think, then it's incompatible with esxi 5 but you are using all-in-one.

If I can't find a fix, i'm going to going to test and use iscsi instead.

Yep found that one too and tried it :) Please keep it up cause I am sure you will find something I have not run across, also helps to see that others are having similar issues. I think I posted some of these to STH forums, probably should have done it here as well.

One data point I have noticed is that when I start getting the "All Data Paths Down" error on ESXi is happens about once an hour or so, however if I restart the SAN VM (not ESXi) I usually get a day or so without any errors. But once the errors start they continue to happen rather often (every hour or so). This behavior makes me think the issue is more on the SAN VM side than on the ESXi side.
 
Is zfs on Linux still not recommended? Ideally I would use Ubuntu but when I set up my server zfs on Linux was not recommended. Anyone make the switch from OI to Ubuntu? If it still isn't recommended I guess i'll switch to OmniOS.
 
I am testing napp-it on latest stable OmniOS on an old HP ML360 G5 (Broadcom GB Nics. 12GB RAM) with 16 SAS hard drives for both datastore for esxi 5 and backup repository for Veeam 6.5. This is going to be put in a co-location for Disaster Recovery.

2 x ESXi boxes (with nfs mounts)
1 x nappit + omnios (serving nfs and iscsi)

I installed OmniOS on an 8GB USB.

When I am testing LAN replication with Veeam from another esxi to my test esxi box with the OmniOS NFS datastore, the NFS datastore kept getting disconnected (and thus my replication failed) after a good amount of data transfer and it seems that the NFS box cannot handle the high throughput or something. Some replication succeeded while some not. It's unpredictable or inconsistent.

What can possibly go wrong? On nappit GUI, I click on System > Log but it's not showing any info now. It was showing log messages a couple of days ago, but not anymore.

The Veeam 6.5 box is a VM using vmxnet nic. I'm going to try the E1000 and see if it helps.

my opinion is: some of you guys are over-extending experimental systems. it's fine for testing and fun, but I wouldn't expect those things to work. or expect to find fixes.

I think if you run zfs on bare metal you will still find problems. stacking it into a big system seems like trouble to me. there's a reason for virtualization, and giving yourself more trouble is not it
 
my opinion is: some of you guys are over-extending experimental systems. it's fine for testing and fun, but I wouldn't expect those things to work. or expect to find fixes.

I think if you run zfs on bare metal you will still find problems. stacking it into a big system seems like trouble to me. there's a reason for virtualization, and giving yourself more trouble is not it

Curious what about either of our systems do you consider experimental? I admit my setup is a bit on the experimental side in that I have an All-in-one setup, however my loads are VERY load compared to my hardware.

13 Seagate Constellation ES.2 SAS drives in a Mirror-Stripe(1 standby) config with 2 LSI 2308 cards, on a SuperMicro X9DR7 motherboard with 2 Xeon E5-2640 and 32GB RAM, in a Supermircro 846A-R1200B chassis.

I am planning to try Solaris 11.1 to see if that cures my issue. Maybe it has better network stack as I suspect the issue is with the network stack. If that does not cure the issue I might move from VMWARE to Xen. Rather not do that because I fear the Guest migration may be painful.
 
I started my server with a mirrored array of 40gb drives and found that was insufficient. I have replaced one at a time with 500gb drives and have auto expand on, but cant seem to get it to expand. Is there a safe and not too painful way to facilitate this in napp-it?

Thanks
 
Last edited:
......
When I am testing LAN replication with Veeam from another esxi to my test esxi box with the OmniOS NFS datastore, the NFS datastore kept getting disconnected (and thus my replication failed) after a good amount of data transfer and it seems that the NFS box cannot handle the high throughput or something. Some replication succeeded while some not. It's unpredictable or inconsistent.

What can possibly go wrong? On nappit GUI, I click on System > Log but it's not showing any info now. It was showing log messages a couple of days ago, but not anymore.
.......


I did experience the same issue.
I used 1 vkernel interface for both management + storage. OI/Omni had only 1 e1000 nic.

During high load, vsphere host / storage showed "disconnected" state.

My fix:
For esxi host: i create separate management vkernel in a vlan, another vkernel for storage in different vlan. And i set different physical nic teaming policy for each vlan.
For storage vm: 1 vnic for storage, 1 vnic for management, 1 vnic for zfs replication.

Use vmxnet3 if you can. E1000 vnic will drop frames under heavy load. SSH to esxi host, type esxtop, then press 'N', read the column DROP.

HTH.

-----------------------

Is zfs on Linux still not recommended? Ideally I would use Ubuntu but when I set up my server zfs on Linux was not recommended. Anyone make the switch from OI to Ubuntu? If it still isn't recommended I guess i'll switch to OmniOS.

IMHO, unless you need something specific to linux, opensolaris is recommended.
Zfs on linux doesn't support NFSv4 ACL. Only standard unix permission (700, 755 ...).
Zfs on linux must use solaris porting layer (SPL) to communicate <=> slower.

I did make a simple test:
Create a ramdisk on ubuntu, then make zpool using ramdisk. VM vnic is vmxnet3.
Repeat the same steps on omni, zpool on ramdisk, vmxnet3.
On another vm: mount nfs, issue copy commands.

Result: IO on omni could reach ~ 6500mbit/s. Ubuntu is ~4500mbit/s.
CPU is xeon e5-2620. All vm run on the same server, same vlan.

-------------------

Did anyone try CIFS guest access in this scenario:

1. Share zfs dataset, guest is allowed: zfs set sharesmb="name=public,guestok=true" poolX/share
2. Create an unix group: groupadd grpB
3. Create an unix user: useradd usrA -G grpB
4. Set password for that user: passwd usrA
5. Create SMB group: smbadm create grpB
6. Add user to SMB group: smbadm add-member -m usrA grpB
7. Add smb mapping: idmap add -d wingroup:grpB unixgroup:grpB
8. Also map guest account to that user: idmap add -d winname:Guest unixuser:usrA
9. Share ACL is everyone@:full_set

If folder ACL of poolX/share is everyone@:full_set, guest can access successfully. ("/bin/chmod -R A=everyone@:full_set:fd:allow poolX/share")
If folder ACL of poolX/share is usrA:full_set, guest cannot access. ("/bin/chmod -R A=user:usrA:full_set:fd:allow poolX/share")

Why guest acc (map to usrA) can't access the folder, which usrA has full permission?
Any help is appreciated.
Thanks.
 
Last edited:
I am not sure if there is a Windows name if you enable anonymous guest access.
So you need at least everyone@=read
 
I started my server with a mirrored array of 40gb drives and found that was insufficient. I have replaced one at a time with 500gb drives and have auto expand on, but cant seem to get it to expand. Is there a safe and not too painful way to facilitate this in napp-it?

Thanks

40 GB should be sufficient for a pure storage server.
If you need more, either reinstall or for ESXi or about basics: read http://hardforum.com/showthread.php?t=1680337
 
Gea,

sorry if this isn't the right place, but I've noticed that one in a while my FS will kill all SMB share access... It seems to happen randomly (either under heavy access for extended periods) or just after having the server on for a very long time (months).

I haven't been able to nail down what is the cause. The server itself remains completely responsive as does your console (napp-it) so I just go in and reboot the server and all is well.

Is there somewhere I can check to see if any log might have info as to why this is happening?

Thanks!
 
I did experience the same issue.
I used 1 vkernel interface for both management + storage. OI/Omni had only 1 e1000 nic.

During high load, vsphere host / storage showed "disconnected" state.

My fix:
For esxi host: i create separate management vkernel in a vlan, another vkernel for storage in different vlan. And i set different physical nic teaming policy for each vlan.
For storage vm: 1 vnic for storage, 1 vnic for management, 1 vnic for zfs replication.

Use vmxnet3 if you can. E1000 vnic will drop frames under heavy load. SSH to esxi host, type esxtop, then press 'N', read the column DROP.

HTH.
I thought about trying this as a possible solution for me, however do you think that it would matter in my case since the Storage device is actually a VM within ESXi? (All-in-One).
 
I thought about trying this as a possible solution for me, however do you think that it would matter in my case since the Storage device is actually a VM within ESXi? (All-in-One).

If you are in doubt if the virtual ESXi Nic is the problem you can either try e1000 or the faster VMXnet3 or a physical NIC that you can pass-through.
 
Gea,

sorry if this isn't the right place, but I've noticed that one in a while my FS will kill all SMB share access... It seems to happen randomly (either under heavy access for extended periods) or just after having the server on for a very long time (months).

I haven't been able to nail down what is the cause. The server itself remains completely responsive as does your console (napp-it) so I just go in and reboot the server and all is well.

Is there somewhere I can check to see if any log might have info as to why this is happening?

Thanks!

You can check the system-log, but tell us about your OS,
Workgroup or Domain mode, does a SMB service stop/start solve the problem?
 
If you are in doubt if the virtual ESXi Nic is the problem you can either try e1000 or the faster VMXnet3 or a physical NIC that you can pass-through.

Well I have tried VMXnet3 and e1000, both have the same disconnect issues with ESXi datastore.
 
Just checked my esxtop with the N

The only interface with any %DRPRX is my SAN and it is really high at ~40%.

I never checked this before when I was running VMXnet3 interface on it, I will switch this weekend and see what that changes. However my problem with disconnects did not change when I went from VMXnet3 to e1000.
 
I did experience the same issue.
I used 1 vkernel interface for both management + storage. OI/Omni had only 1 e1000 nic.

During high load, vsphere host / storage showed "disconnected" state.

My fix:
For esxi host: i create separate management vkernel in a vlan, another vkernel for storage in different vlan. And i set different physical nic teaming policy for each vlan.
For storage vm: 1 vnic for storage, 1 vnic for management, 1 vnic for zfs replication.

Use vmxnet3 if you can. E1000 vnic will drop frames under heavy load. SSH to esxi host, type esxtop, then press 'N', read the column DROP.

HTH.

Thanks dualathlon! I will definitely tried this. In my quick setup and testing, I did have only one Nic with both management and storage data going through the same vswitch/portgroup/network.

While doing research, I did run into and thought about assigning a separate vkernel for the NFS storage network but totally forgot about it.

So, from what you are saying, anything involving high throughput for NFS requires the vkernel interface?

thanks!
 
Is zfs on Linux still not recommended? Ideally I would use Ubuntu but when I set up my server zfs on Linux was not recommended. Anyone make the switch from OI to Ubuntu? If it still isn't recommended I guess i'll switch to OmniOS.
I made the switch from OpenIndiana to ZoL 0.6.1 under Ubuntu recently, with 3 servers in different roles (one home server, one NAS/SAN for video production and one backup server for a business). If you look at the ZoL bug tracker, it's easy to see the implementation is not mature yet, however, I had only minor issues so far, so I don't regret making the switch.
 
So I switched my OmniOS VM from e1000 to vmxnet3 and my DROP packets reported in esxtop went from ~45% to 0. Great, however I still get a lot of "Device or filesystem with identifer [XXXX] has entered the All Paths Down state.

Does anyone think moving my SAN vm from OmniOS to Solaris 11.1 might help? Starting to run out of ideas here. My other thought is to move my VM from Esxi to Xen. Course that would require much more work than migrating my SAN os since I would have to convert all my VMs.
 
I have performance problems with Nexenta. After one year problem-free operation, I have now every day problems with heavy writes, which makes zpool slow.

Nexenta is used only as iSCSI target for virtual servers. I checked all ZVOLs used for iscsi targets with Dtrace script zfsio.d (https://github.com/kdavyd/dtrace/blob/master/zfsio.d) which show me disk IO per ZVOLS. But there are no heavy writes on any ZVOL,

But if I check iostat -x 1, there are many write operations per second (many more, that sum of write operations from zfsio dtrace script).

If I check which process is making disk writes, it's "zpool-<poolname>".

Capacity of thet zpool is 57% full.

Do you have any Idea where heavy disk writec come from? How can I find what cause that?

iostat sample (per second)
Code:
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd1      25.0  170.0  253.5 1862.5  0.0  0.5    2.7   0  31
sd2      48.0  316.0  448.5 3346.0  0.0  0.6    1.7   0  36
sd3      68.0  315.0  385.0 3346.0  0.0  0.6    1.5   0  32
sd4      30.0  334.0  188.0 3531.5  0.0  0.6    1.7   0  34
sd6      33.0  282.0  141.0 3531.5  0.0  0.6    2.1   0  41
sd8      28.0  172.0  149.5 1862.5  0.0  0.5    2.7   0  28
sd19     39.0  281.0  364.0 3857.5  0.0  0.6    1.8   0  35
sd20     30.0  278.0  217.5 3857.5  0.0  0.7    2.1   0  36
sd21     27.0  149.0  217.0 1927.0  0.0  0.5    2.7   0  28
sd22     33.0  149.0  159.5 1927.0  0.0  0.5    3.0   0  34
sd23      0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0

zpool list
Code:
NAME             SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
storage-d2-a    4.53T  2.62T  1.91T    57%  1.00x  ONLINE  -
syspool         18.5G  1.81G  16.7G     9%  1.00x  ONLINE  -

zpool status
Code:
  pool: storage-d2-a
 state: ONLINE
 scan: none requested
config:

        NAME                       STATE     READ WRITE CKSUM
        storage-d2-a               ONLINE       0     0     0
          mirror-0                 ONLINE       0     0     0
            c0t5000C50035F27643d0  ONLINE       0     0     0
            c0t5000C500362C678Fd0  ONLINE       0     0     0
          mirror-1                 ONLINE       0     0     0
            c0t5000C5003631A8A5d0  ONLINE       0     0     0
            c0t50014EE20595F221d0  ONLINE       0     0     0
          mirror-2                 ONLINE       0     0     0
            c0t50014EE25AEB2D0Ad0  ONLINE       0     0     0
            c0t50014EE2B035B0DDd0  ONLINE       0     0     0
          mirror-3                 ONLINE       0     0     0
            c0t5000C5003F3560DCd0  ONLINE       0     0     0
            c0t5000C500362C601Fd0  ONLINE       0     0     0
          mirror-4                 ONLINE       0     0     0
            c0t5000C500362C5E05d0  ONLINE       0     0     0
            c0t5000C500362BFF84d0  ONLINE       0     0     0
        logs
          c0t50015179594D70EEd0    ONLINE       0     0     0

errors: No known data errors

Thanks for any idea.
 
Tried CTRL+F5, tried rebooting, tried a different computer.. Same result.


Edit: Rebooted omniOS - works now.
 
I have performance problems with Nexenta. After one year problem-free operation, I have now every day problems with heavy writes, which makes zpool slow.

Nexenta is used only as iSCSI target for virtual servers. I checked all ZVOLs used for iscsi targets with Dtrace script zfsio.d (https://github.com/kdavyd/dtrace/blob/master/zfsio.d) which show me disk IO per ZVOLS. But there are no heavy writes on any ZVOL,

But if I check iostat -x 1, there are many write operations per second (many more, that sum of write operations from zfsio dtrace script).

If I check which process is making disk writes, it's "zpool-<poolname>".

Capacity of thet zpool is 57% full.

Do you have any Idea where heavy disk writec come from? How can I find what cause that?

zfsio.d .... i don't think that shows zvol activity pretty sure that is file only. do a

wget -(the ignore cert flag) https://raw.github.com/kdavyd/dtrace/master/kd_collect.sh

chmod +x kd_collect.sh and then ./kd_collect. the script will create /perflogs and set it to gzip9. in that directory it will create a number of different .out files that should give you more actionable info.

however, it is entirely possible that your free space map is fragmented. this typically starts happening around 60% full and gets worse as you fill up the filesystem more and more. when ZFS commits a write it searches for 1024K of contiguous free space. as you fill the filesystem and as you do COWs this table gets really fragmented.

as an example if you were to zfs send/receive all your zvols to an identical system if you're running into this problem your performance on the enw system would be much better because you have 43% of your space free and haven't yet fragmented the free space map.
 
Thank you very much madrebel. Wery useful scripts.

Code:
2013 Jul  2 14:35:10 storage-d2-a   1424 ms, 36 wMB 2 rMB 1897 wIops 275 rIops 23+1 dly+thr; dp_wrl 205 MB .. 207 MB; res_max: 206 MB; dp_thr: 207
2013 Jul  2 14:35:15 storage-d2-a   1143 ms, 33 wMB 1 rMB 2966 wIops 97 rIops 54+1 dly+thr; dp_wrl 207 MB .. 245 MB; res_max: 211 MB; dp_thr: 245
2013 Jul  2 14:35:20 storage-d2-a   1404 ms, 32 wMB 3 rMB 2142 wIops 274 rIops 0+0 dly+thr; dp_wrl 223 MB .. 245 MB; res_max: 123 MB; dp_thr: 245
2013 Jul  2 14:35:25 storage-d2-a   1317 ms, 30 wMB 3 rMB 2173 wIops 333 rIops 0+0 dly+thr; dp_wrl 218 MB .. 223 MB; res_max: 133 MB; dp_thr: 223
2013 Jul  2 14:35:30 storage-d2-a   1285 ms, 31 wMB 5 rMB 2129 wIops 326 rIops 21+0 dly+thr; dp_wrl 218 MB .. 233 MB; res_max: 207 MB; dp_thr: 233
2013 Jul  2 14:35:35 storage-d2-a   1283 ms, 28 wMB 2 rMB 1965 wIops 328 rIops 0+0 dly+thr; dp_wrl 233 MB .. 234 MB; res_max: 171 MB; dp_thr: 233
2013 Jul  2 14:35:40 storage-d2-a   1419 ms, 28 wMB 2 rMB 1858 wIops 242 rIops 0+0 dly+thr; dp_wrl 234 MB .. 237 MB; res_max: 175 MB; dp_thr: 237
2013 Jul  2 14:35:45 storage-d2-a   1651 ms, 24 wMB 5 rMB 1502 wIops 450 rIops 0+0 dly+thr; dp_wrl 214 MB .. 237 MB; res_max: 124 MB; dp_thr: 237
2013 Jul  2 14:35:50 storage-d2-a   1575 ms, 32 wMB 2 rMB 1853 wIops 264 rIops 0+0 dly+thr; dp_wrl 213 MB .. 214 MB; res_max: 162 MB; dp_thr: 214
2013 Jul  2 14:35:54 storage-d2-a   1493 ms, 32 wMB 5 rMB 1880 wIops 458 rIops 39+1 dly+thr; dp_wrl 213 MB .. 231 MB; res_max: 214 MB; dp_thr: 231
2013 Jul  2 14:35:59 storage-d2-a   1290 ms, 26 wMB 1 rMB 1703 wIops 113 rIops 0+0 dly+thr; dp_wrl 222 MB .. 231 MB; res_max: 151 MB; dp_thr: 231
2013 Jul  2 14:36:04 storage-d2-a   1316 ms, 26 wMB 2 rMB 2057 wIops 398 rIops 0+0 dly+thr; dp_wrl 220 MB .. 222 MB; res_max: 144 MB; dp_thr: 222
2013 Jul  2 14:36:09 storage-d2-a   1484 ms, 25 wMB 1 rMB 1642 wIops 163 rIops 0+0 dly+thr; dp_wrl 209 MB .. 220 MB; res_max: 167 MB; dp_thr: 219

My spa_sync seems to by realy slow and there are many write operations. But I still cannot find what causing this. Maybe you are right with fragmentation. Is there any way, how can I confirm, that this is due fragmentation and not by some other reason?

Thank you.
 
off the top of my head no idea if there is a way to check free space map fragmentation. i will see if i can find out.

i do know the free space map is high on the list within nexenta. pretty much all COW filesystems have the problem, ZFS less than say WAFL but it still gets pretty bad. as i understand it these free space map issues would be part of the new code writen to implement block pointer re-write.

here is a link that explains the problem

http://blog.delphix.com/uday/2013/02/19/78/
 
Thanks for reply. I red that interesting article.

I wonder how to make so nice metalabs tables, where I can see amount of fragmentation as in that article :)
 
Hmmm, they are using zdb -mm <poolname> command for metalabs usage mappings. Thats interesting. So maybe this will be the best way how to find, how fragmented my pool is.
 
Back
Top