OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

schleicher · Aug 13, 2015

schleicher said:
About stability, which i like to share:

I tested in the past OmniOS 151012 and VMWare ESXi 6, when it was out and it was NOT stable with NFS datastores. Then i switched back to ESXi 5.5U2 with the latest patch level and all was fine. Meanwhile i switched to OmniOS 151014 - 170cea2 from April, everything also fine with NFS Datastores. Last week i switched to the latest Patch Level ESXi 6.0.0b - ESXi-6.0.0-20150704001-standard (Build 2809209), NFS also working very stable, there was a bug fix regarding NFS in KB2111983:

"Slow NFS storage performance is observed on virtual machines running on VSA provisioned NFS storage. Delayed acknowledgements from the ESXi host for the NFS Read responses might cause this performance issue.

This patch resolves this issue by disabling delayed acknowledgements for NFS connections."

After a while everything was fine, i tested the latest OmiOS build 151014 - 7648372 from 27 July, and i was loosing again connection to NFS datastores. So this combination is NOT stable.

Since EXSi 5.1 i had various issues regarding NFS in combinations with OmniOS and ESXi, its a bit annoying to find out, what is working stable and what not. With latest ESXi 6 and OmniOS 151014 from
april, all is good for me.

I had to correct my experience, somehow i was confused, too many testing in the last time

the nfs problems i have are present in every version of OmniOS 151014 (170cea2 from April and 7648372 from 27 July) so its only stable with the latest ESXi 6 and OmniOS 151012. I guess there must be something wrong with 151014 with neworking or NFS. I can remember gea posted something about problems on some mailing list or so, cannot find the link anymore...

jmk396 · Aug 13, 2015

_Gea said:
Joining an AD is not a system wide join. It works only for the Solaris CIFS server to use the AD users with their Windows security id's.The CIFS servers store them as extended ZFS attributes so Solaris can act similar to an NTFS filesystem regarding permissions.

Other services like Apache or Proftp must care about a directory service for their own.

Thanks for the information. It looks like Linux distributions can use realmd/sssd that integrate into PAM to allow domain users to login to the system even if a local account doesn't exist. I think that's what I was trying to ask...

_Gea · Aug 13, 2015

You use PAM on Solaris as well.

ps
PAM was invented by Sun on Solaris
https://en.wikipedia.org/wiki/Pluggable_authentication_module

nostradamus99 · Aug 15, 2015

schleicher said:
I had to correct my experience, somehow i was confused, too many testing in the last time

the nfs problems i have are present in every version of OmniOS 151014 (170cea2 from April and 7648372 from 27 July) so its only stable with the latest ESXi 6 and OmniOS 151012. I guess there must be something wrong with 151014 with neworking or NFS. I can remember gea posted something about problems on some mailing list or so, cannot find the link anymore...

I've also been testing some more today no luck for me using OmniOS :
https://forums.servethehome.com/ind...unstable-consistent-apd-on-vm-power-off.6099/ (starts about half way down)

Solaris 11.2 has worked for me with version 5.5 and 6.0.

_Gea · Aug 15, 2015

I have no idea why you have such NFS problems with OmniOS 151014 (April and July update)
while OpenIndiana and Solaris work.

Maybe you have more luck with your new hardware otherwise Solaris can be an option.
If you stay with v28/5 pools are moveable.

btw
I have just uploaded a new ZFS appliance for ESXi
- based on OpenIndiana 151014 July update
- now includes ESXi tools from 6.0.0.b

with an additional BE with former OmniOS 151014/ April + ESXi tools 6.0.0

This VM is provided as a thin provisioned OVA template.
Import it within a few minutes via vsphere and menu File >> Deploy OVA Template

nostradamus99 · Aug 16, 2015

@Gea,
Well I wish I could run all my VM's on that HP Microserver but it will only take 16GB. I'm going to use it as a learning tool for work and backup my production machine.
But if time permits I'll test some stuff out.

Nice job creating a new OVF, saves 10's of GB on diskspace !

schleicher · Aug 16, 2015

_Gea said:
I have no idea why you have such NFS problems with OmniOS 151014 (April and July update)
while OpenIndiana and Solaris work.

Maybe its hardware or setup related, i never found out whats causing these problems...

If i had to set on something i would say network drivers or NFS locking for some reason.

@nostradamus99: Do you have one or two network adapters in your OmniOS VM?

nostradamus99 · Aug 16, 2015

I have 2, 1 for NFS and 1 for CIFS. (NFS nic is 100% virtual, no real nic is attached)

Hardware used
MSI Z77MA-G45
Core i5-3550
32GB RAM
Intel 82571EB Network cards
M1015 flashed to IT using P19 passed through to VM in question.

I've documented everything to the best of my ability in this thread : https://forums.servethehome.com/ind...unstable-consistent-apd-on-vm-power-off.6099/

For now I can go 3 routes:
- SSD NFS datastore under Solaris
- Try iSCSI under OmniOS (using my SSD as block based storage)
- Local SSD as datastore and make backups to a volume under ZFS.

My preference goes to option 3 for now because that way if I decide to update/upgrade/rebuild my storage I don't have to shutdown all my VM's (which causes problems with other people around me

)

@schleicher
What hardware are you using?

nostradamus99 · Aug 16, 2015

Also anyone here using Solaris 11.2 and seeing slow download speeds when installing packages?
(tried e1000/vmxnet3) I'm on 50/50 MBit fibre connection. (other downloads go at 5 MB/s)

Does Oracle limit bandwidth or something? (speeds vary from 4 kb/s to 512kb/s)

schleicher · Aug 16, 2015

nostradamus99 said:
I have 2, 1 for NFS and 1 for CIFS. (NFS nic is 100% virtual, no real nic is attached)

I have also two virtual nics, maybe that is the problem, one exclusive for the ESXi - OmniOS - NFS Connection and one for accessing NFS CFIS and other services from my network, with different subnets on each one. Running the VNICs as e1000 since i experienced stabiltiy problems with vmnet3 in the past.

nostradamus99 said:
@schleicher
What hardware are you using?

Supermicro X9SCM-F
INTEL 1155 XEON E3-1230V2, 4x 3.30GHz
32 GB RAM
LSI 9201-16i
LSI 9200-8e

fastedd27 · Aug 16, 2015

I recently came home from a work trip to find my server seemily up and running ( able to navigate on OI desktop) but no shares was available. I proceeded with a reboot and system would hang at boot up screen. I assumed that I had some sort of boot disk issue and installed OI on a fresh SSD. I was able to import 2 of my 3 pools without issue but the 3rd does not. I have used both -f and -fF with each the system would run for a while and I would return to find the system rebooted and pool not imported. When I tried with the readonly option... it did return a result saying that it was in the middle of a scrub and would stay there stuck at 85% with no eta saying that it started several days ago. When I reboot...the pool would be gone again. Im going to include some details I assume will help but please if anyone needs more please let me know what and also how to obtain them as I'm still somewhat green on all this.

----------------------------------------------------------------------------------------------------------------------------------
System Specs
Memory size: 65527 Megabytes

System Configuration: Supermicro X8DAH
BIOS Configuration: American Megatrends Inc. 2.1 12/30/2011
BMC Configuration: IPMI 2.0 (KCS: Keyboard Controller Style)

==== Processor Sockets ====================================

Version Location Tag
-------------------------------- --------------------------
Intel(R) Xeon(R) CPU L5639 @ 2.13GHz CPU 1
Intel(R) Xeon(R) CPU L5639 @ 2.13GHz CPU 2

All drives spread evenly across (3) 9201-16i cards

---------------------------------------------------------------------------------------------------------------------------------

Pool VER RAW SIZE/ USABLE ALLOC RES FRES AVAIL zfs [df -h/df -H] DEDUP FAILM EXP REPL ALT GUID HEALTH SYNC ENCRYPT ACTION ATIME
MEDIA01 - 32.5T/ 21.3TB 30.2T - - 1.21T [1.3T /1.4T] 1.00x wait off off - 5527406130588050241 ONLINE standard n.a. clear errors -
MEDIA02 - 43.5T/ 28.5TB 38.8T - - 2.68T [ /] 1.00x wait off off - 9858510776374733655 ONLINE standard n.a. clear errors -
MEDIA03 - 10.9T/ 7.8TB 6.79T - - 2.84T [2.9T /3.2T] 1.00x wait off off - 5893503720527931427 ONLINE standard n.a. clear errors -
rpool - 111G/ 109.3GB 22.3G - - 53.1G [54G /57G] 1.00x wait off off - 8457584574581494171 ONLINE standard n.a. clear errors off
----------------------------------------------------------------------------------------------------------------------------

pool: MEDIA01
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0 in 97h16m with 0 errors on Sun Aug 16 04:16:41 2015
config:

NAME STATE READ WRITE CKSUM CAP Product /napp-it IOstat mess
MEDIA01 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c4t5000CCA242C0D963d0 ONLINE 0 0 0 6 TB HGST HDN726060AL S:0 H:0 T:0
c4t5000CCA242C17D69d0 ONLINE 0 0 0 6 TB HGST HDN726060AL S:0 H:0 T:0
c4t5000CCA242C1AF4Ed0 ONLINE 0 0 0 6 TB HGST HDN726060AL S:0 H:0 T:0
c4t5000CCA242C1BB7Ed0 ONLINE 0 0 0 6 TB HGST HDN726060AL S:0 H:0 T:0
c4t5000CCA242C1C96Ad0 ONLINE 0 0 0 6 TB HGST HDN726060AL S:0 H:0 T:0
c4t5000CCA242C1D049d0 ONLINE 0 0 0 6 TB HGST HDN726060AL S:0 H:0 T:0

errors: No known data errors

pool: MEDIA02
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub in progress since Mon Aug 10 03:00:03 2015
33.1T scanned out of 38.8T at 1/s, (scan is slow, no estimated time)
17.7M repaired, 85.42% done
config:

NAME STATE READ WRITE CKSUM CAP Product /napp-it IOstat mess
MEDIA02 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c4t5000039FF3C55631d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000039FF3C928B1d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000039FF3D1671Ed0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000CCA369CAF94Bd0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000CCA369F55A60d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000039FF3C1FF35d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
raidz2-1 ONLINE 0 0 0
c4t5000039FF3E00C08d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000039FF3E00F0Fd0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000CCA369D805FAd0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000CCA369EF2F97d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t50014EE25C53FD73d0 ONLINE 0 0 0 2 TB WDC WD2002FAEX-0 S:0 H:0 T:0
c4t5000039FF3C2117Dd0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
raidz2-2 ONLINE 0 0 0
c4t5000039FF3D166F6d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000039FF3D9CE96d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000CCA369E22C4Bd0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000CCA369E2519Cd0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000CCA37DC74659d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000CCA37DC7B829d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
raidz2-3 ONLINE 0 0 0
c4t5000039FF3C1C2FEd0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000039FF3C1C6E1d0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000CCA369D1B2D1d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000CCA369E23AF0d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0
c4t5000039FFAD3E9BBd0 ONLINE 0 0 0 2 TB TOSHIBA DT01ACA2 S:0 H:0 T:0
c4t5000CCA369E576E3d0 ONLINE 0 0 0 2 TB Hitachi HDS72302 S:0 H:0 T:0

errors: No known data errors

Any advice would be greatly appreciated! Thanks

_Gea · Aug 17, 2015

I would expect a weak or semi dead disk.

When ZFS reads a disk it will wait quite a long time (up to 180s).
In this time the disk can retry to read a sector. Some problems may even block the HBA.

In such cases, I would remove all disks and boot up. Then check the system and fault logs to find the affected disk. Other option is to hot-insert disk by disk then and check if its detected properly and if a problem is logged. if you find a fault, import the pool without that disk..

ps
llumos is improving timeout behaviour especially with the mpt sas driver or to avoid a blocked HBA with Sata disks. Maybe a more current OS than OI like OmniOS behaves better.

_Gea · Aug 17, 2015

about an NFS Problem with ESXi + OmniOS latest

I have just updated a storage VM OmniOS 151014 on one of my testmachines from April->July edition. After that, I saw the all path down and was not able to remount the NFS datastore.

A reboot of ESXi fixed at least my problem.
A switch or update of the storage VM with NFS enabled may require an ESXi reboot.

schleicher · Aug 17, 2015

Thanks for the hint, i never reboot my esxi host, mabye that is the problem

I will give that a try...

jmk396 · Aug 17, 2015

Sorry to ask again about authentication/authorization, but I'm still a little confused.

I've installed an Active Directory server (Windows 2012 R2) on my home network and have successfully bound napp-it (Oracle Solaris 11.2) to it. I've also connected my Windows 10 machine as a client and several Debian machines using realmd/sssd. Kerberos authentication is working great and I login to any of my servers using any of my domain (LDAP) accounts.

Now I want to setup several applications (Plex, Nzbget, Sonarr, Couch Potato) on a client machine (Debian or similar), but I want them each to connect to the file server under different credentials. My goal is that when I look on my file server, I can see which application(s) created or modified any of the files. I also want to see these users as the correct display names (LDAP) when browsing using my Windows machine.

I could use either NFS or SMB protocols, but I'm still confused on the Kerberos authentication or what I should be looking into.

Can anybody give me any advice or point me in the right direction?

I'm not sure if this involves using Kerberos and/or idmap, or....?

svavaroe · Aug 27, 2015

Hello list.
I know this has been discussed many times, but this error is still somehow unresolved.

I had a working setup with OmniOS version r151014 and Napp-It 0.9f1.
I recently upgraded my OmniOS installation to the newest commit, omnios-d08e0e5.
Rebooted the machine and now I have this error :

Code:

Software error:

Can't load '/var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so' for module IO::Tty: ld.so.1: perl: fatal: /var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so: wrong ELF class: ELFCLASS32 at /usr/perl5/5.16.1/lib/i86pc-solaris-thread-multi-64/DynaLoader.pm line 190.
 at /var/web-gui/data/napp-it/CGI/IO/Tty.pm line 30.
Compilation failed in require at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
Compilation failed in require at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
Compilation failed in require at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718.
BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718.

For help, please send mail to this site's webmaster, giving this error message and the time and date of the error.
Software error:

[Thu Aug 27 13:46:51 2015] admin.pl: Can't load '/var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so' for module IO::Tty: ld.so.1: perl: fatal: /var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so: wrong ELF class: ELFCLASS32 at /usr/perl5/5.16.1/lib/i86pc-solaris-thread-multi-64/DynaLoader.pm line 190.
[Thu Aug 27 13:46:51 2015] admin.pl:  at /var/web-gui/data/napp-it/CGI/IO/Tty.pm line 30.
[Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
[Thu Aug 27 13:46:51 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7.
[Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
[Thu Aug 27 13:46:51 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/Expect.pm line 22.
[Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718.
[Thu Aug 27 13:46:51 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718.
Compilation failed in require at admin.pl line 885.

For help, please send mail to this site's webmaster, giving this error message and the time and date of the error.
[Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: Can't load '/var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so' for module IO::Tty: ld.so.1: perl: fatal: /var/web-gui/data/napp-it/CGI/auto/IO/Tty/Tty.so: wrong ELF class: ELFCLASS32 at /usr/perl5/5.16.1/lib/i86pc-solaris-thread-multi-64/DynaLoader.pm line 190. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: at /var/web-gui/data/napp-it/CGI/IO/Tty.pm line 30. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/IO/Pty.pm line 7. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/CGI/Expect.pm line 22. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/CGI/Expect.pm line 22. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718. [Thu Aug 27 13:46:51 2015] admin.pl: [Thu Aug 27 13:46:51 2015] admin.pl: BEGIN failed--compilation aborted at /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718. [Thu Aug 27 13:46:51 2015] admin.pl: Compilation failed in require at admin.pl line 885.

I also tried to upgrade the Napp-It installation from 0.9f1 to f6 and a reboot in between. Same story.
So I reverted to a previous BE before the f6 upgrade and the same story. So Napp-it doesn't like the newest OmniOS upgrade ?

I have tried every fix there is about this, copying from var/web-gui/data/tools/omni_stable and or omni_bloody contents
/var/web-gui/data/napp-it/CGI

Not working.

Any solution that actually works? Should I never upgrade my OmniOS installation or my Napp-it breaks down?

my PATH variable is very short and default look alike : /usr/gnu/bin:/usr/bin:/usr/sbin:/sbin

Best regards and in much hope for solution.

Svavar O

_Gea · Aug 27, 2015

This problem is related to the Perl Module Expect.pm that is sensitive to the OmniOS Perl release. Napp-it includes to versions of the module for OmniOS 151006 to 151014 that is needed for interactive commands like passwd or join a domain. After login napp-it evaluates which of the two should be used. So a napp-it logout/login should fix this error.

If you have installed software that modifies Perl, this may not be enough. In such a case you can check for software from SmartOS (pkgin) that does not modify the core Perl. Another Option is to install Expect.pm from CPAN manually.

Last option is to comment the "use Expect" in zfslib.pl. After this some functions are not working like change password within napp-it or join a domain.

svavaroe · Aug 27, 2015

Thankx.
I installed the Expect through CPAN, and still get the errors.
I also tried to delete the IO and auto folder, as it got re-created after login.
Same problem.
/etc/init.d/napp-it stop and start. So logout/login dosn't work, as reboot.

I havn't used pkgin in some time, as every package from pkgin goes to /opt/local and not /usr/
so I would think the system-based perl is safe.

Should I comment out the "use Expect" in all the zfslib.pl files ?
Or maybe just remove napp-it completly and try again....

_Gea · Aug 27, 2015

If you install Expect manually, you may need to delete or replace the files in
/var/web-gui/data/napp-it/CGI/IO/ and /var/web-gui/data/napp-it/CGI/auto/IO/
- or rename the IO folders.

Removing/ reinstalling napp-it will not solve the problem.
You may comment the use Expect (see error, /var/web-gui/data/napp-it/zfsos/_lib/illumos/zfslib.pl line 2718)

Other option is a clean reinstall of OmniOS

dualathlon · Aug 28, 2015

A little test result for you guys:

Esxi 5.1U3 is a very picky NFS client.

Storage is 8x 512GB samsung 850 pro.
Raid-scheme under test is raid-10.
I did test on direct-connect backplane, and expander-backplane.

Work fine for years:
Omni 151010
Omni 151012

-----------

Work so-so, APD if svmotion ~200GB VM, IO speed reach ~400MBytes/s.

Ubuntu 14.04.3 LTS. I tried btrfs, ZoL.
Tried setting Esxi - NFS.MaxQueueDepth = 64 <= no different

-----

APD immediately:

Omni 151014 (release 7648372)

-----

Next test is solaris 11.3.

brianmat · Aug 28, 2015

Hey _gea we are following your SSD build and I am wondering if you have to do additional over provisioning on the Sandisk Extreme Pros or if they are already good to go out of the box.

Also, is there still an issue with over 128GB of RAM or is that still a soft limitation we should follow?

brianmat · Aug 28, 2015

dualathlon said:
A little test result for you guys:

Esxi 5.1U3 is a very picky NFS client.

Storage is 8x 512GB samsung 850 pro.
Raid-scheme under test is raid-10.
I did test on direct-connect backplane, and expander-backplane.

Work fine for years:
Omni 151010
Omni 151012

-----------

Work so-so, APD if svmotion ~200GB VM, IO speed reach ~400MBytes/s.

Ubuntu 14.04.3 LTS. I tried btrfs, ZoL.
Tried setting Esxi - NFS.MaxQueueDepth = 64 <= no different

-----

APD immediately:

Omni 151014 (release 7648372)

-----

Next test is solaris 11.3.

I believe 5.5U2 fixes most of the APD issues. We are running into this with 5.1 on one of our servers with short APD issues but our 5.5U2 server has 0 APD messages in the logs.

_Gea · Aug 28, 2015

brianmat said:
Hey _gea we are following your SSD build and I am wondering if you have to do additional over provisioning on the Sandisk Extreme Pros or if they are already good to go out of the box.

Also, is there still an issue with over 128GB of RAM or is that still a soft limitation we should follow?

The Sandisk Extreme Pro come with a build in overprovisioning. (The 960GB is a 1024 GB from Flash). I use them without addinional overprovisioning.

about the 128 GB problem.
The problemreports about more RAM and some race conditions are two years and older. While I have heard about some fixes at Nexenta, I have not heard about a upstream to Illumos.

But there are more and more people using more (I do not as I do not need myself) without newer problem reports published, there is a good chance that this is no longer a problem.

CopyRunStart · Sep 2, 2015

Just curious if you guys think the following could saturate 10Gb Ethernet.

35x 4TB 7200 RPM drives all in mirrors, direct wired to LSI HBA's on a Supermicro X9SRL Mobo with an E5-2620, 128GB of RAM, and an Intel X540-T2. Solaris 11.2

Assuming there were 8 clients connected to the same switch as the NAS, could they saturate the 10Gb? That would be about 1Gb per client.

Could this be done with CIFS? Or would they have to go NFS?

Skud · Sep 2, 2015

CopyRunStart said:
Just curious if you guys think the following could saturate 10Gb Ethernet.

35x 4TB 7200 RPM drives all in mirrors, direct wired to LSI HBA's on a Supermicro X9SRL Mobo with an E5-2620, 128GB of RAM, and an Intel X540-T2. Solaris 11.2

Assuming there were 8 clients connected to the same switch as the NAS, could they saturate the 10Gb? That would be about 1Gb per client.

Could this be done with CIFS? Or would they have to go NFS?

I have:

Supermicro X9SRH-7TF (onboard 10Gb)
Xeon E5-2609
32GB DDR-1333
12 x Mixture of Hitachi 7K3000 and 5K3000 in mirrored pairs across two LSI 9201s (I think)
Sun F20 with 4 x FMODs for SLOG
2 x Mushkin 256GB SSDs for cache

Using ATTO I can pull ~923MB/s read and ~700MB/s writes over iSCSI and sync=always. So, with your setup, I would say you could saturate 10Gb, but it will depend on the workload. Myself, I only see the above numbers while benchmarking.

Riley

_Gea · Sep 3, 2015

Without JumboFrames I would expect SMB to be lower than 400 MB/s, NFS is faster and the fastest is iSCSI. Jumboframes will increase performance as well as Solaris 11.3 with SMB 2.1

CopyRunStart · Sep 3, 2015

Yea I guess I would proceed with iSCSI then.

Can an iSCSI Lun in Solaris be accessed by SMB simultaneously? Or would there be file locking issues?

Also a semi related question: I just did some iSCSI testing on my current pool from a windows machine connected by 1Gb and got write speeds of (173Mb/s). How is that possible?

_Gea · Sep 3, 2015

An iSCSI target is a blockdevice used on the client side like a local disk.
Without a cluster software only one client can connect a LUN.
SMB shares can oinly rely on a ZFS filesystem.

173 Mbit/s is quite slow.
Slow write performance is mostly due a secure write setting.
On ZFS you can switch between slow and secure write or fast unsecure write.
Unsecure means that up to 5s of last writes may be lost on a powerloss.

Fast:
Sync = disabled (filesystem) or write-back cache enabled (zvol/blockdevice)

Secure:
Sync=enabled or writeback cache disabled.
You can increase performance of secure writes with a very fast ZIL/Slog device

CopyRunStart · Sep 3, 2015

It was actually 173 MBytes/s, which is why I was asking how is it possible on 1Gb ethernet. 1Gb ethernet is obviously mathematically limited to 125 MBytes/s.

I wrote a 3GB file in 18 seconds. Would pool-wide compression play a part in that? The lun was formatted as NTFS.

danswartz · Sep 3, 2015

where was the I/O rate being metered? on the client? on the zfs appliance?

CopyRunStart · Sep 3, 2015

Client. Windows 7 copy file dialog box. At first I thought Windows was just incorrectly reporting it, so I used a stop watch and timed it myself and got the same results.

davewolfs · Sep 3, 2015

Thoughts on using NVMe for a slog or is that just getting silly?

danswartz · Sep 3, 2015

Hmmm, windows file copy to an iscsi mounted disk? I'm wondering if there is some kind of asynchronous behavior here?

CopyRunStart · Sep 3, 2015

Yes Windows file copy to an iSCSI mounted disk. I believe the entire pool has async enabled.

danswartz · Sep 3, 2015

I didn't mean on the receiving end - the sender still can't send faster than the network link. I meant what if the last however much of data on the client side is not being counted because it is going asynchronously.

CopyRunStart · Sep 3, 2015

Do you mean is it not waiting for the TCP ack?

danswartz · Sep 3, 2015

Not that level, at the file I/O level or whatever. If it starts the file write and when the app says 'I am done', the explorer or whatever reports a speed based on that, despite that N MB of data is still streaming over the wire...

HammerSandwich · Sep 3, 2015

I agree that you should repeat the test while watching network traffic on both ends.

ST3F · Sep 5, 2015

ZFSGuru 10.03

Asus x79 Extreme 11
Xeon E5 2687w
32 Go ECC 4x 8 Go
2x m1015 flashed in LSI iT p19
14x 2 TB WD SE (7200rpm)
Intel 10GbE XFP+

... in a mirrored configuration (7x miror of 2x 2 TB.)

Client is a Windows 8.1 Pro with the same Intel 10GbE XFP+

Both are connected to switch Mikrotik CRS226-24G-2S+RM with SFP+ 10G SR Finistar

Test with SMB share and the Blackmagic Speedtest software
Write : 450 MB/s
Read : 135 MB/s

.... performance are similar to a RaidZ 5x 1 TB Green (5400 Trs), benchmarked in 2012 !!!

Could the earlier version of OmniOS or OpenIndinana be better in SMB ?
What is SBM version of both system ?

St3f

_Gea · Sep 6, 2015

ST3F said:
Could the earlier version of OmniOS or OpenIndinana be better in SMB ?
What is SBM version of both system ?

St3f

All free Illumos distributions like OI or OmniOS are SMB 1 (kernel based SMB server)
Gordon Ross from Nexenta is working on an upstream of SMB 2.1

Currently you can only use the commercial options for SMB 2.1
- NexentaStor CE (free up to 18 TB raw, noncommercial use only)
- Oracle Solaris 11.3 (free for demo and developer use, currently the most feature rich ZFS server)

or must use SAMBA

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

n00b

Gawd

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

n00b

Weaksauce

Weaksauce

n00b

n00b

Supreme [H]ardness

Supreme [H]ardness

n00b

Gawd

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Weaksauce

Weaksauce

Supreme [H]ardness

Limp Gawd

Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

[H]ard|Gawd

Limp Gawd

Supreme [H]ardness