OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

I currently have an NFS server running ZoL 0.7.11. Works okay, but I'd prefer some Illumos based distro. My ESXi host is connected to the storage server using a point to point QSFP cable and 2 Mellanox Connectx-3 EN cards. Works okay, but I'd like to dabble with RDMA. The most visible products are Mellanox Connectx-4 LX cards, but Mellanox seems long ago to have given up any interesting in any Solaris based product. Chelsio makes a couple of interesting cards (t5 and t6 asics), but chelsio seems to be riding the iwarp bandwagon, which is broken down by the side of the road (even Intel seems to have ditched it.) So now, chelsio is pushing things like iSCSI offload, although I guess you might be able to do RDMA NFS from ESXi => Solarish server. But vmware has stated that from 6.7 forward, they will only support RoCE (implying no support for iWARP). If illumos provides drivers for esxi and solarish, I guess I'd be fine with that, but it's hard to tell from reading illumos HCL what they in fact support. The chelsio driver claims to support everything up to t6 asic, but that doesn't mean it supports all the fancy offload stuff. Sorry for the TL;DR :) If anyone has any concrete recommendations, I'd much appreciate them. If I can't find out anything useful, I might need to stick with Linux and go with the recent Mellanox cards.
 
Thanks, but I've moved on. RMA'ed the chelsio cards and ordered two connectx-5 EN cards. 50gbe single port. Unfortunately, solarish variants aren't supported - that's life I guess...
 
Isn't basic vdev removal available on OmniOS 151025+? I have 151026 installed and accidentally added two disks as basic vdevs to an existing z2 pool. Was looking to be able to remove these without having to perform the usual copy/destroy/recreate.

I ran zpool removal [pool] [disk] but still get the following message:

"invalid config; all top-level vdevs must have the same sector size and not be raidz"
 
Vdev remove is in Illumos/ OpenZFS but currently without support for pools thai contains a raid-z. Vdev remove for all kinds of vdev is currently only available in Oracle Solaris 11.4.
 
napp-it Z-Raid vCluster v.2

I am on the way to finish my Z-Raid Cluster in a Box solution.
In the past a Cluster in a Box consists of two servers with a common pool of mpio SAS disks. One of the servers builds a pool from the disks and offers services like NFS or SMB. On a failure or for maintenance you can switch over to the second server within a few seconds. Management is done for example with RSF-1 from high-availability. SuperMicro offers special cases that can hold two mainboards

Such solutions are expensive and management is quite complex but offers high performance and high availability. To reduce costs, my solution is based on ESXi (any license) to virtualize the two nodes and a control instance. It uses the shared controller/ shared raw disk options of ESXi so my solution does not need multipath SAS but can work with any disks.

Setup see http://www.napp-it.org/doc/downloads/z-raid.pdf

If you want to try the preview state that allows a manual failover within 20s, you can update napp-it to the free 18.02 preview (or 18.12dev).


cluster-in-a-box.png
 
vCluster Beta2 is available (napp-it 18.12g dev)

Current state:
Manual Failover between nodes working for NFS and SMB
For SMB it supports failover for local user and AD users connected to the share during failover.

Todo
auto-failover (on tests)

Expect vCluster to next Pro (Jan 2019)
 
I'm trying to configure TLS Mail and I'm getting the following error when trying to install IO::Socket:SSL in the CPAN Shell.

Code:
cpan[1]> notest install IO::Socket::SSL
Reading '/root/.cpan/Metadata'
  Database was generated on Mon, 19 Nov 2018 03:29:03 GMT
Running install for module 'IO::Socket::SSL'
Checksum for /root/.cpan/sources/authors/id/S/SU/SULLR/IO-Socket-SSL-2.060.tar.gz ok
Scanning cache /root/.cpan/build for sizes
............................................................................DONE
'YAML' not installed, will not store persistent state
Configuring S/SU/SULLR/IO-Socket-SSL-2.060.tar.gz with Makefile.PL
ld.so.1: perl: fatal: relocation error: file /usr/perl5/site_perl/5.28/i86pc-solaris-thread-multi-64int/auto/Net/SSLeay/SSLeay.so: symbol CRYPTO_get_locking_callback: referenced symbol not found
Warning: No success on command[/usr/perl5/5.28/bin/i386/perl Makefile.PL]
  SULLR/IO-Socket-SSL-2.060.tar.gz
  /usr/perl5/5.28/bin/i386/perl Makefile.PL -- NOT OK
Failed during this command:
 SULLR/IO-Socket-SSL-2.060.tar.gz             : writemakefile NO '/usr/perl5/5.28/bin/i386/perl Makefile.PL' returned status 9

I've recently upgraded from omniosce-r151026 to omniosce-r151028.
 
I was successful with Tls (https://napp-it.org/downloads/tls_en.html):

To enable TLS emails on OmniOS 151018 and up, use the following setup

(use putty, login as root and copy/paste commands with a mouse right-click, on questions use defaults thanks to Rick)

perl -MCPAN -e shell
notest install Net::SSLeay
notest install IO::Socket::SSL
notest install Net::SMTP::TLS
exit;

Just confirm any questions with enter (skip tests)
 
ok, cool, thanks Gea! Unfortunately, Oracle Solaris' lack for VMXNet 3 drivers kills it for me :/
 
I was successful with Tls (https://napp-it.org/downloads/tls_en.html):

To enable TLS emails on OmniOS 151018 and up, use the following setup

(use putty, login as root and copy/paste commands with a mouse right-click, on questions use defaults thanks to Rick)

perl -MCPAN -e shell
notest install Net::SSLeay
notest install IO::Socket::SSL
notest install Net::SMTP::TLS
exit;

Just confirm any questions with enter (skip tests)

That's the document I followed when I was trying to get it going. I'm having issues on this particular server when running the following command.

notest install IO::Socket::SSL
 
On a clean install of OmniOS 151028 it installs when just confirming defaults.
But I have seen several problems on updates (depends on first version in the update chain) as 151028 finalized the move from Sun SSH to OpenSSH.

What you can try
- check SSH settings, see https://omniosce.org/info/sunssh
- deinstall and reinstall SSH

Maybe the fastest way is
install 151028 from scratch and import the datapool
you must save/restore /var/web-gui/_log and users with same uid to keep napp-it and basic settings
(napp-it Pro: run a backup job that saves this to the datapool, restore via Users > Resore)
 
Last edited:
Hi,

FYI to those upgrading from R151026 to R151028, I've encountered the following errors:

tty and PAM access error, solved by upgrading napp-it to 18.12 dev from 18.06 pro

SSH was down due to /etc/ssh/sshd_config: line 103: Bad configuration option: MaxAuthTriesLog.
Replaced /etc/ssh/sshd_config with /etc/ssh/sshd_config.new
 
OmniOS 151028 comes with a newer Perl and OpenSSH instead of the former Sun SSH.
Prior updating OmniOS you must update to newest napp-it free/pro/dev release that supports the new OmniOS.

For SSH you find an option to switch defaults to OpenSSH in menu Services

A remaining problem is TLS alert mails. This stops working after an update.
This is unsolved. Workaround is a new setup of 151028 instead an update.
 
I was already on 12.06pro which was the latest pro. Still encountered all the errors until moved to the latest dev.

The TLS is no big deal.
I am using the restricted gmail smtp server to send the message, seems to be working well.

By the way, I've been meaning to ask, what is the napp-it pro agents acceleration?
 
Without acceleration, napp-it requests OS informations like snaps or disks whenever you call the menu ex disks. With a few disks there is only a short delay. With 20 disks you may wait 10s until all disks are detected with their state. On a Petabyte system with 60 or 90 disks this may take a minute - on every action that manipulates an item under Disks or Pools.

Acc allows napp-it to work in an async way. Disk values are continously requested in the background by a software agent (a service or daemon) to allow a disk or pool menu to be processed immediatly.
 
I have two arrays that I have had for quite some time, one of which has been on both FreeNAS & OmniOS. Moving away from ZFS All-In-One to bare metal, but cannot import either of my pools due to the attached error. I should be able to get OmniOS back to whatever software level support this feature, but have not been able to figure out how to update to that level as of yet. Obviously I"m missing something simple, but can't figure exactly what.

EDIT: Figured it out, had to use the nappit OVF and deploy that way, and then my pools were recognized.
 

Attachments

  • ZFSError.jpg
    ZFSError.jpg
    37.9 KB · Views: 0
Last edited:
If a ZFS feature is enabled you can only import that pool on a operating system that supports this feature (does not matter if same OS or cross-platform) or in readonly mode. If you need compatibility with older OS releases you should avoid features. When you create a pool ex in napp-it you can disable all features for compatibility reasons.

In your case you should try OmniOS 151028 (newest OmniOS) .
 
Hello _Gea,

I'm about to update an OmniOS/Napp-IT server and I'm curious as to your current thoughts on bonding 10GbE ports for throughput and resiliency.

Perhaps unwisely, I have multiple ixgbe ports, but only one of them really gets used.

I currently have 6 (XenServer) virtualization hosts, soon to be 8, using NFS for VHD storage.

Would it be a good idea to bond a spare 10GbE (Intel X540-T2) port, or leave it fallow?

On an older OmniOS/Napp-IT server that was used for backup, I successfully bonded 4x (Intel) 1GbE ports.

In the past, you advised against bonding, so I didn't. Has your advice changed over the last few years?

Thanks in advance,

G
 
Performance may improve but you add complexity that may affect stability....


btw
Thanks and a Happy New Year!
 
Last edited:
Trust me to do things the wrong way around, I have updated Omnios CE from r151026 to r151028 and yes you guest it Napp-it giving errors and can not log on, I have seen that I need to delete a line somwhere and creat a new password, or re-install omnios r151028 with napp-it or go back to previous Omnios BE and update napp--it.

Anyone got a step by step on the easiest roote without having to create a new password?
 
Sudo/Permission problems in napp-it
If you first update napp-it, then OmniOS, napp-it cares about OmniOS 151028

If you missed to update napp-it first:
- go back to former BE, update napp-it and then OmniOS or
- delete the napp-it line in /etc/user_attr and set a napp-it pw via passwd napp-it

https://napp-it.org/downloads/omnios-update.html
 
Thank you _Gea, I have been looking out there and not much on how to go back to former BE, I would guess this involves booting the server and and hitting the number corresponding to this option on OmniOS, I don't want to just go ahead and try it because of twice loosing data when either updating to new os or other updating reason (yes, I am a ZFS OmniOS newb, although I have been using ZFS from when nexenta had a supported/ updated a free version).

Thanks for confirmation.
 
Hi!

After a disk crash, replacement and then a bit of fiddling I see a weird cache problem with my disks. The cap and type of one of my disks show the values of a disk that was removed from the neighboring slot.
Any way to fix this without a reboot? See below, the smart disk table shows the correct values. The disk is a 3TB WD Red

Disk view:
upload_2019-1-14_18-56-21.png


Smart view
upload_2019-1-14_18-58-18.png


Pool:
upload_2019-1-14_18-59-35.png


Regards
Wish
 
If you use disk detection based on controller number + controller port (c3t0) you do not have hotplug at alll or per default (you can enable Disk hotplug on OmniOS, works not on every board). This means then you need a reboot to re-read all disks correctly.

Another problem may be disk caching in napp-it.
You can clear the cache in menu Disks > Delete Disk buffer.

A third problem may be a minor bug in the last release of napp-it where you must reload menu disks after a modification to show new values, https://napp-it.org/downloads/changelog_en.html
 
Last edited:
Sudo/Permission problems in napp-it
If you first update napp-it, then OmniOS, napp-it cares about OmniOS 151028

If you missed to update napp-it first:
- go back to former BE, update napp-it and then OmniOS or
- delete the napp-it line in /etc/user_attr and set a napp-it pw via passwd napp-it

https://napp-it.org/downloads/omnios-update.html


Hi _Gea, I have deleted the Napp-it line in /etc/usr_attr., how do I set a napp-it passwd via passwd napp-it?

Thanks
 
How do I fix the below in Napp-it, or is this on the omniOS side SSH vs Open SSH?

Tty.c: loadable library and perl binaries are mismatched (got handshake key 9e40080, needed 9a80080) after from omnios r151026 to r151028
 
There are a lot of new things in 151028 that napp-it must care about (like this error related to the newer Perl release).
So you must first update napp-it to newest (18.12), then OmniOS.

If napp-it is running, update now (About > Update)
otherwise boot last BE and update napp-it, then OmniOS

btw
If you need TLS Alert Mail, you must do a clean install of 151028.
TLS is not working after an update and not installable due incompatibilities with former/new SSH

To keep settings
After setup, restore /var/web-gui/_log/* for napp-it settings, restore users with same uid/gid and import pool
http://www.napp-it.org/doc/downloads/setup_napp-it_os.pdf
 
Hi!

I've got a problem with jobs on a fresh install... email-jobs are not run, I've created several but no emails are being sent.
The test email works okay. As you can see auto is enabled but the jobs are still 'new' after a whole day.
What am I missing?

upload_2019-2-10_18-41-11.png


Regards,
Wish
 
Are you using standard or TLS email?

If you use Jobs > Report you can set email method per job (standard port 25 or TLS).
For the basic alert and status you must set email method globally (Menu Jobs > TLS email > enable/disable TLS) where current default is TLS (ex Gmail)
 
Back
Top