OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

IdiotInCharge · May 15, 2019

Thanks for the detailed response!

WishYou · May 21, 2019

Hi _Gea !

I've found and fixed a bug in your zpool cap calculations.
There is a rounding error that may or may not hit hard depending on the layout and usage of the pools.
'zfs list' outputs numbers with _comma_ but perl requires _point_ to handle calculations correctly.

I've added a quick fix to zfslib_val2kb:

Code:

###############
 sub zfslib_val2kb  {  #hide:
###############

      my $w1=$_[0];
->    $w1=~s/,/./;
      if ($w1=~/K/) { $w1=~s/K//;  }
      if ($w1=~/M/) { $w1=~s/M//; $w1=$w1*1000; }
      if ($w1=~/G/) { $w1=~s/G//; $w1=$w1*1000000; }
      if ($w1=~/T/) { $w1=~s/T//; $w1=$w1*1000000000; }
      if ($w1=~/P/) { $w1=~s/P//; $w1=$w1*1000000000000; }
      return ($w1);
 }

Before:

Code:

NAME    USED    AVAIL   MOUNTPOINT      %
rpool   23,2G   15,3G   /rpool  39%
storage 5,62T   5,61T   /storage        50%
tank    17,3T   3,95T   /tank   15%!
vmstore 304G    107G    /vmstore        26%

After, I've added a decimal point as well here, because it looks nicer...

Code:

NAME    USED    AVAIL   MOUNTPOINT      %
rpool   23,2G   15,3G   /rpool  39.7%
storage 5,62T   5,61T   /storage        50.0%
tank    17,3T   3,95T   /tank   18.6%
vmstore 304G    107G    /vmstore        26.0%

Regards,
Wish

_Gea · May 21, 2019

Thanks a lot !

dedobot · May 21, 2019

SMB file sharing quick tip: If you can afford it -disable smb sighning at the windows client too, not only at the SMB server. Via domain policy or local gpo, depend of the situation. Leave smb1 restrictions untouched. Same for macos.

CopyRunStart · May 29, 2019

Hey Gea, not sure if it is just an issue with my system but resetting ACL permissions doesn't seem to work in the Napp-it GUI on 18.12 with Solaris 11.4.

I'm doing it manually via:

chmod A- Folder/Name
chmod -fvR A=everyone@:modify_set:file_inherit/dir_inherit:allow folder/name/path/

Does this look right? Do I also need to give root full_set? How do I do both in one command?

_Gea · May 30, 2019

I will check that with Solaris 11.4

You can set ACL via A= (see https://docs.oracle.com/cd/E18752_01/html/819-5461/gbace.html) but another way is using Windows when SMB connected as root. You should only avoid deny rules from Windows as Windows process first deny rules where as Solarish respects order of rules. (napp-it can set order of rules).

If you want to have two commands in one line, use cmd1; cmd 2 or cmd1; && cmd1;

update
If have tried the ACL reset in napp-it 18.12 and 19.06 and it worked in both cases.

CopyRunStart · May 31, 2019

Thanks Gea. It must be something wrong with my system. I can't remember exactly what it said, but it was stuck at something about the guest user.

I have set it with

Code:

chmod -fvR A=everyone@:modify_set:file_inherit/dir_inherit:allow folder/name/path/

but I'm confused how I can also add another ACL for root through Solaris command line. When I tried

Code:

chmod -fvR A=user:root:ful_set:file_inherit/dir_inherit:allow folder/name/path/

it deleted the ACL for everyone@.

_Gea · May 31, 2019

A= sets an ACL and deletes all former settings.
If you want to add an ACL additionally , use A+

CopyRunStart · May 31, 2019

Thanks Gea.

sjalloq · Jun 3, 2019

Hi there,

looking for some system advice. I'm looking at setting up some shared storage for a mini cluster and wondered if I can do that with Napp-IT. We're going to have 3-4 compute nodes running a mixture of EDA tools and numerical simulators and I'm trying to understand how to implement shared storage across these. Is NFS going to be fast enough or would we have to look at some other technology? Or perhaps we just run NAIO type servers and have local work areas on each machine.

Anyone got any experience with this type of setup?

Thanks.

_Gea · Jun 3, 2019

It depends on what you expect from a "fast enough".

NFS (or SMB) is the usual way to share storage between clients. The fastest way is a dedicated storage server where you need to care about a fast enough network, latency, performance improvements by ramcache or the question whether you need a secure write behaviour (can you allow a dataloss of writes beeing in the ramcache on a crash).

If you virtualise storage and nodes, you must check if the server can satisfy the combined needs of storage and computing nodes. The method of shared access is mostly also NFS/SMB. You may use local ZFS storage ex via LX zones on OmniOS but the usual way for shared storage access especially with full virtualisation is also NFS/SMB.

For very high performance needs, a Cluster filesystem or dedicated storage via FC/iSCSI is an option, optionally a local fast cache device that syncs or moves data asyncron to a common storage. But as you hint about a napp-in one setup, I suppose a dedicated or virtualised NFS filer is ok with virtualised or dedicated nodes connected via a fast network (Up to 10Gb/s is achievable with a quite common setup).

sjalloq · Jun 5, 2019

Thanks Gea,

can anyone point me at some reading on how to benchmark storage and/or applications? I guess a good starting point would be to benchmark our current machine and workflows. We're running on CentOS if that makes a difference.

_Gea · Jun 6, 2019

A onepager for napp-it beginners
https://napp-it.org/doc/downloads/onepagers.pdf

TCM2 · Jun 17, 2019

WishYou said:
'zfs list' outputs numbers with _comma_ but perl requires _point_ to handle calculations correctly.

I see that nothing much has changed in the amateurish way that this software is developed.

zfs list explicitely has -p and -H to output parsable data, but no, let's go and parse _localized_(!) output for humans instead.

_Gea · Jun 26, 2019

Native Open-ZFS encryption is in Illumos since today
https://illumos.topicbox.com/groups/developer/Tf7694793544b0431

OmniOS:
https://gitter.im/omniosorg/Lobby
https://gitter.im/omniosorg/Lobby
OpenIndiana
if illumos is at 0.5.11-2018.0.0.18656 or newer

IdiotInCharge · Jun 26, 2019

_Gea, perhaps you can answer this: are Aquantia NICs supported in Illumos yet?

I'd like to try it on metal sometime, and that's my current roadblock.

_Gea · Jun 26, 2019

I suppose no, https://illumos.org/hcl/

If you want to know if support is planned,
best is to ask the developers at Illumos-discuss
https://illumos.topicbox.com/groups/discuss

_Gea · Jun 27, 2019

Current OpenIndiana 2019.05 supports native ZFS encryption after a
pkg upgrade

Then update your pool to support encryption
pkg upgrade pool

Then create a file with the key (ex 31 x 1)
echo 1111111111111111111111111111111 > /key.txt

Then create an encrypted filesystem ex enc on your "pool" based on that key
zfs create -o encryption=0n keyformat=raw -o keylocation=file:///key.txt pool/enc

Limitations:
Do not encrypt rpool (bootloader does not support this at the moment)
Keymanagement options are still limited

Documentation on Open-ZFS encryption is still quite limited
(beside Oracle Solaris, but their implementation is still more feature rich)
what I found is https://blog.heckel.xyz/2017/01/08/zfs-encryption-openzfs-zfs-on-linux/

update
current napp-it 19.dev supports ZFS encryption on Illumos based on passwort prompt.

shanester · Jul 2, 2019

Question regarding the addition of a vdev to my config:
I currently have a Norco 4220 / SM x9scm-f and two m1015 HBAs. I have two raidz2 vdevs each with six 2TB drives and have two 2TB spares.
I was thinking of adding a new raidz2 vdev with six 3TB (512k) drives, replacing the current 2TB spares with a single 3TB spare and adding one more m1015 HBA to support the additional drives.

Is this addition technically sound?

_Gea · Jul 2, 2019

You can of course remove the hotspares and create a new vdev but

All modern disks are 4k disks. If you force 512B/ashift=9 you will see a performance degration
and you are not able to replace a faulted disk with a 4k one

If you want to create such a pool, I would create a new pool from a vdev of 3TB disks (ashift=12).
Then copy the data over and add the 2TB vdevs where you also force ashift=12 (ashift is a vdev property)

Your pool will have 28TB usable from 18 disks and > 100W power need.
If you intend to buy the disks newly, I would propably avoid such many small disks and create
a whole new pool from larger disks (6TB, or propably 8TB) and a single raid-Z2.

shanester · Jul 2, 2019

_Gea said:
You can of course remove the hotspares and create a new vdev but

All modern disks are 4k disks. If you force 512B/ashift=9 you will see a performance degration
and you are not able to replace a faulted disk with a 4k one

If you want to create such a pool, I would create a new pool from a vdev of 3TB disks (ashift=12).
Then copy the data over and add the 2TB vdevs where you also force ashift=12 (ashift is a vdev property)

Your pool will have 28TB usable from 18 disks and > 100W power need.
If you intend to buy the disks newly, I would propably avoid such many small disks and create
a whole new pool from larger disks (6TB, or propably 8TB) and a single raid-Z2.

gea, thanks for your input. I 'wanted' to build an entire new storage device, but funds are tight right now. My thought was to purchase 3TB (HUA723030ALA640) that are 512B to avoid the performance degradation. This would provide me with approximately an additional 12TB until I am able to build a new system.

_Gea · Jul 3, 2019

A performance degration is with 4k disks (512e are also 4k) and ashift=9.
There is no problem with real 512B disks and ashift=12 so try to create a new pool with all vdevs in ashift=12

I would not buy nowadays real 512B disks but 512e (or 4k) and I would not buy small 3TB disks.

mikeo · Jul 3, 2019

So have been getting tons of failures lately on pretty old 3tb and 4tb disks, and these: https://hardforum.com/threads/holy-shizzzz-micron-4-0tb-5100-eco-series-sata-2-5-ssd-347.1983459/ being relatively cheap and enterprise grade, making the switch to an SSD pool.

Anyone here ever build an SSD pool? With the much lower expected failure rate and higher scrub / resilver speeds is it still worth going with raidz2 over raidz1?

shanester · Jul 3, 2019

_Gea said:
A performance degration is with 4k disks (512e are also 4k) and ashift=9.
There is no problem with real 512B disks and ashift=12 so try to create a new pool with all vdevs in ashift=12

I would not buy nowadays real 512B disks but 512e (or 4k) and I would not buy small 3TB disks.

Due to the $$ at this time, and needed additional storage capacity with limited chasis space, I pulled the trigger on (7) 3TB 512B drives, maxing out the slots in my 4220.

At this point what would be the benefit of creating a new pool with ashift=12, other than being able to replace the drives (one by one) at a later date?
Wouldn't it be better to just have 3 vdevs in the same pool and build a new unit at a later time?

IdiotInCharge · Jul 3, 2019

mikeo said:
Anyone here ever build an SSD pool? With the much lower expected failure rate and higher scrub / resilver speeds is it still worth going with raidz2 over raidz1?

I haven't, but I do want to say upon considering it, the question is hard to answer. With spinners, best advice was to do pools of mirrors especially as drive capacities increase while drive speeds largely don't.

A 4TB SSD flips the script a bit. Relative to spinners that's still small, rebuild times are going to approach drive size divided by drive speed due to lower access times, and a second failure during rebuild becomes a much smaller worry, in theory.

The problem with the theory is that SSDs are far less predictable than spinners. The Micron enterprise drive above is a notable exception as you should be able to take their numbers to the bank, so applying theory RAIDZ1 might even be overkill in the sense that the drives simply may not fail before they are obsoleted operationally, which means that RAIDZ1 might be the right kind of overkill.

_Gea · Jul 4, 2019

OmniOS bloody 151031 now supports native ZFS encryption

napp-it 19.dev from today (jul 04) supports encryption
in menu ZFS filesystem (create, lock, unlock)

shanester · Jul 8, 2019

I did a fresh installation of OmniOS r15130 after a successful upgrade of esx 6.7u2.
After installing napp-it, I am unable to import my existing ZFS pool. What am I doing wrong?

When I shut down the new VM and go back to my old VM running r151018, I can import the existing zfs pool.

_Gea · Jul 8, 2019

Are the disks detected?
Have you enabled pass-through in 018 and not in 030?

shanester · Jul 8, 2019

_Gea said:
Are the disks detected?
Have you enabled pass-through in 018 and not in 030?

OMG I feel like a total noob...it has been so long, that is exactly what I forgot to do!!

TigerLord · Jul 16, 2019

I'm running Napp-it 18.12u on openindiana 5.11 Hipster

I'm running a SMB share I can connect fine to from a Windows environment.

I am trying to mount the same SMB share on my Android TV box, specifically an Nvidia Shield.

It's not working. The Shield reports no error, it just brings me back to the network menu and nothing is mounted or added.

It could be a Shield problem, I'm investigating it, but I was wondering if there was any known samba issue between napp-it and android that would prevent me from successfully mounting a share on my shield?

Thanks in advance.

_Gea · Jul 16, 2019

Are you on the newest OI?
If not do a pkg upgrade (You can go back to the last BE)

I know that there are some Android clients that does not work and some that does, but I do not use any.

Alternative if possible: NFS3

TigerLord · Jul 17, 2019

After further investigation it seems nVidia is aware of the issues surrounding Samba. They've been releasing fixes for the past year, but they still need to release more. As of April 2019 they were still working on SMBv3 implementation, as the Shield is stuck to SMBv1 for now.

NFS is not natively supported by the Shield's OS. Kodi can mount NFS shares fine, but it defeats the purpose of what I'm trying to accomplish, which is the Shield running Plex server.

Funnily, the shield runs on a Linux kernel but doesn't support a linux protocol like NFS, only Window's. Lol, makes total sense.

IdiotInCharge · Jul 17, 2019

TigerLord said:
Funnily, the shield runs on a Linux kernel but doesn't support a linux protocol like NFS, only Window's. Lol, makes total sense.

For the use case, it makes sense...

But it's still funny, and I'm still frustrated because even Microsoft has abandoned SMBv1. I've yet to get my Shield hooked up to my CentOS 7 server (running ZFS), but of course, Plex works great. I just use the CentOS box as the Plex server.

Do they not have a spin that would work on OpenSolaris, or could you not say toss up a Ubuntu etc. VM for that purpose?

shanester · Jul 17, 2019

I would like some direction on the next steps to fix my pool. Prior to the screenshot below, I did not have any pool errors. I had a disk (A785 - see below) that was showing hard errors so i initiated a replace with the spare. The swap did not complete and the pool was in a degraded status. A zpool clear fixed cleared the errors and I the replacement again with the same degraded status, however, the pool remains in a degraded status after the scrub. I don't know what to do next.
What I want to accomplish is:
1. Fix the health of the pool.
2. Replace the disk with the spare.

I also noticed that the zpool version is showing a value of " - "

_Gea · Jul 17, 2019

So basically want to remove the hot spare and replace it with the bad disk

Use Pool > Clear to clear the "too many errors"

To remove the hot spare finally from the pool, use menu Disk > Remove
Then replace: use menu Disk > Replace

Are the problem disks on the same power, backplane or HBA?
Maybe there is a reason

After you have replaced the disk with the hard errors do a low level/intensive test ex with WD datalifeguard

pool version ="-" is ok.
This is a synonym to Open-ZFS pool v5000 with feature flags where features determine differences , no longer a pool version. Internal v5000 is mainly to be sure that Oracle will never come close to this with their genuine ZFS.

shanester · Jul 17, 2019

_Gea said:
So basically want to remove the hot spare and replace it with the bad disk

Use Pool > Clear to clear the "too many errors"

To remove the hot spare finally from the pool, use menu Disk > Remove
Then replace: use menu Disk > Replace

Are the problem disks on the same power, backplane or HBA?
Maybe there is a reason

After you have replaced the disk with the hard errors do a low level/intensive test ex with WD datalifeguard

pool version ="-" is ok.
This is a synonym to Open-ZFS pool v5000 with feature flags where features determine differences , no longer a pool version. Internal v5000 is mainly to be sure that Oracle will never come close to this with their genuine ZFS.

"bad" disk = c0t5000CCA36AC2A785d0
spare = c0t5000CCA36AD05EA2d0

I want to remove the bad disk and replace it with the spare. As I mentioned earlier, I ran zpool replace c0t5000CCA36AC2A785d0 c0t5000CCA36AD05EA2d0 which 'failed'.
So if I understand what your saying, is to remove the spare (c0t5000CCA36AD05EA2d0) from the pool first and then replace replace c0t5000CCA36AC2A785d0 with c0t5000CCA36AD05EA2d0?.

I have 3 m1015 HBAs for the backblane and there does not appear to be a correlation between the failing vdev and a specific port on the backplane or hba. However I will power this box down and check the connections.

_Gea · Jul 17, 2019

What was the problem with a replace faulted <-> spare?

Usually a spare can replace a faulted disk but remains a spare.
The idea behind is that you replace the faultet disk then, replace spare > new and the spare remains a spare.

If you want to remove the spare property, first remove then replace.

shanester · Jul 17, 2019

_Gea said:
What was the problem with a replace faulted <-> spare?

Usually a spare can replace a faulted disk but remains a spare.
The idea behind is that you replace the faultet disk then, replace spare > new and the spare remains a spare.

If you want to remove the spare property, first remove then replace.

The drive wasn't faulted, just hard errors (S:0 H:136 T:0), so I wanted to replace with the spare. The swap never completed and the vdev went into a degraded state "too many errors". I will attempt to clear the errors and try again.

shanester · Jul 18, 2019

_Gea said:
What was the problem with a replace faulted <-> spare?

Usually a spare can replace a faulted disk but remains a spare.
The idea behind is that you replace the faultet disk then, replace spare > new and the spare remains a spare.

If you want to remove the spare property, first remove then replace.

The zpool clear completed but did not clear the errors. I am stuck and open to further suggestions.

_Gea · Jul 18, 2019

If the problem would be initiated by a single incident in the past and the hardware would be now ok, a scrub and pool clear would be enough.

If you cannot identify the problem with the help of the system and fault log and if there is no common part like same HBA or power cabling for the 6 disks showing problems, I would look at the disk with the hard errors.

Maybe this disk affects the others negatively ex by blocking something. I would offline or even physically remove this disk and retry a disk replace or scrub and clear.

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

NVIDIA SHILL

n00b

Supreme [H]ardness

Weaksauce

Limp Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Limp Gawd

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

Gawd

Supreme [H]ardness

NVIDIA SHILL

Supreme [H]ardness

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Gawd

Weaksauce

NVIDIA SHILL

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

[H]ard|Gawd

Supreme [H]ardness

[H]ard|Gawd

NVIDIA SHILL

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

Weaksauce

Supreme [H]ardness