OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

shanester

Weaksauce
Joined
Mar 1, 2011
Messages
70
If the problem would be initiated by a single incident in the past and the hardware would be now ok, a scrub and pool clear would be enough.

If you cannot identify the problem with the help of the system and fault log and if there is no common part like same HBA or power cabling for the 6 disks showing problems, I would look at the disk with the hard errors.

Maybe this disk affects the others negatively ex by blocking something. I would offline or even physically remove this disk and retry a disk replace or scrub and clear.

That was my thought..I pulled the drive. Lets see what happens.
upload_2019-7-18_12-6-10.png
 

shanester

Weaksauce
Joined
Mar 1, 2011
Messages
70
That was my thought..I pulled the drive. Lets see what happens.
I powered off the box and validated that all of the connections were seated properly. Upon powering, it started to re-sliver again.
I have 3 M1015 flashed to P20 on a X9SCM-F MB in a Norco 4220. M1015 #1 is in PCI slot 7 (x8) connected to BP 1/2, M1015 #2 is in PCI slot 6 (x8) connected to BP 3/4 and M1015 #3 is in PCI slot 5 (x4) connected to BP 5.
The drives that are showing 'degraded' are connected to M1015 #1/BP1/2.
If there was an issue with the HBA or backplane(s), I would assume that all of the drives connected to those devices would be affected, which is not the case.
I have looked through the logs and can't determine what exactly is failing.
I can run another zpool clear, memtest, downgrade firmware on the M1015, replace the Seasonic X-Series 650 PS (although not sure there is a power issue), replace degraded disk by disk, but I feel I am just chasing my tail at this time.

upload_2019-7-23_9-14-30.png
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
I would power off and connect "bad" disks to another HBA (optionally export/import/clear error)
If the "bad" disks remain at the HBA this is the problem otherwise more the backplane.
 

shanester

Weaksauce
Joined
Mar 1, 2011
Messages
70
I would power off and connect "bad" disks to another HBA (optionally export/import/clear error)
If the "bad" disks remain at the HBA this is the problem otherwise more the backplane.

As always, thank you for your input. I have determined that I had a bad memory dimm. Upon removing the dimm and running a couple of clear/scrubs (only a few corrupt files), I am happy to say that the devices are all online again.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
Ram problems can always affect stability. A little curious that only one vdev is affected. If the problem disappers it is clear that this was the reason.
 

sorhol

n00b
Joined
Jul 29, 2014
Messages
16
I am trying to update from OmniOS 151020 to 151028.

My update to 151022 went OK.
When I tried to update to 151024 I was told invalid certificate. So I ran:
wget -P /etc/ssl/pkg https://downloads.omniosce.org/ssl/omniosce-ca.cert.pem
pkg update web/ca-bundle

Now I cant update to 151024, I get the message:


Reject: pkg://omnios/entire@11-0.151024
Reason: No version matching 'incorporate' dependency incorporation/jeos/omnios-userland@11-0.151024 can be installed
----------------------------------------
Reject: pkg://omnios/incorporation/jeos/omnios-userland@11-0.151024
Reason: No version matching 'incorporate' dependency library/libxml2@2.9-0.151024 can be installed

If anyone can point me in the right direction it will be appriciated.

I was also thinking it might be easier just to install 151028 from scratch (Esxi) and load datapools? But I became unsure when I read about backup/restore of users. The var/web-.gui/_log I did understand.

Thanks in advance
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
There are some problems regarding such an update like the certificate problem but also the switch from Sun SSL to Open-SSL. If you have installed something that is no longer available, you must first remove prior an update.

For ESXi
I would install the OmniOS 151028 ova, restore /var/web-gui/_log/* and all users/groups with same uid/gid. Then import the pool.

If you use ESXi 6.7u2 you cannot import the ova due a problem with the ESXi GUI and the ova deploy menu. Only solution is an import via ovftools or a downgrade of the ESXi gui or a wait to 6.7u3

With ESXi 6.7u2 I would
- install OmniOS 151030 on a 30GB vdisk with an e1000 and a vmxnet3 vnic (dhcp) from OmniOS iso
- add Openvmtools: pkg install open-vm-tools
- add napp-it via the wget command
- optionally add tls email, run a default tuning and update napp-it

This is how I create the ova
use e1000 for management and thge faster vmxnet3 for filer use
 

sorhol

n00b
Joined
Jul 29, 2014
Messages
16
There are some problems regarding such an update like the certificate problem but also the switch from Sun SSL to Open-SSL. If you have installed something that is no longer available, you must first remove prior an update.

For ESXi
I would install the OmniOS 151028 ova, restore /var/web-gui/_log/* and all users/groups with same uid/gid. Then import the pool.

If you use ESXi 6.7u2 you cannot import the ova due a problem with the ESXi GUI and the ova deploy menu. Only solution is an import via ovftools or a downgrade of the ESXi gui or a wait to 6.7u3

With ESXi 6.7u2 I would
- install OmniOS 151030 on a 30GB vdisk with an e1000 and a vmxnet3 vnic (dhcp) from OmniOS iso
- add Openvmtools: pkg install open-vm-tools
- add napp-it via the wget command
- optionally add tls email, run a default tuning and update napp-it

This is how I create the ova
use e1000 for management and thge faster vmxnet3 for filer use


Thank you so much _Gea.
I will go with the 151028 option since I have ESXI 6.7u1.
Just 2 final questions:
I will have to add users manually (I mean I cant restore the users) right ?
This is the correct OVA file - zfs_vsan_omni028_esxi67v5.ova right ?

I see this workflow:
Remove VMs from inventory
Remove Napp-it from inventory and delete files
Import new Napp-it OVA
Passthrough of HBAs
Restore _Log*
Import Pools
Add users manually in Napp-it
Add datastore from Napp-it
Add VMs from Napp-it datastore
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
ova and workflow is correct

It is possible to copy over users when you copy/restore
/etc/passwd
/etc/shadow
/etc/group
/etc/user_attr

/var/smb/*

I do this with the ZFS cluster functionality and current napp-it pro with jobs > backup and user > restore
 

sorhol

n00b
Joined
Jul 29, 2014
Messages
16
ova and workflow is correct

It is possible to copy over users when you copy/restore
/etc/passwd
/etc/shadow
/etc/group
/etc/user_attr

/var/smb/*

I do this with the ZFS cluster functionality and current napp-it pro with jobs > backup and user > restore

Thank you much _Gea...!
 

sorhol

n00b
Joined
Jul 29, 2014
Messages
16
I have now successfully upgraded to 151028 by importing new OVA.
I am a little unsure whether to run zpool upgrade or not. I case I dont need the new features, is there any other benefits from upgrading?
Once again - Thank you so much Gea for you always kind help
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
If you upgrade you may not be able to import again on the older OS versions so
if current featureset is ok you do not need to upgrade.

A new feature that you propably want is native ZFS encryption.
This is currently in OmniOS 151031. Possibly it will be backported to 151030 lts.
 

sorhol

n00b
Joined
Jul 29, 2014
Messages
16
I have one pool that is causing my problems. Typically one disk is giving me IOstat errors and eventually the pool becomes unavailable.
This is a striped pool of 4 consumer grade SSDs connected to a LSI-2307 HBA. The pool is just for temporary non important data.

When it began like a year back, I changed the cable and eventually I replaced the disks. After I took out the original disks I did run a smart check on them in a different computer with no errors.
I am having the same problem with the new disks - 1 disk giving IOstat errors and eventually the pool becoms unavailable.
I changed the cable - no effect.
I just unplugged the disk giving the error and recreated a new pool using the remaining 3 disks. Now I get the same errors just on one of the other disks
I was thinking about replacing the HBA.
I tried to run zpool status -v but just got a "list of errors unavailable (insufficient privileges)" reply.

Does anyone have any ideas for me how to proceed?

Thanks in advance.
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
These errors indicate a hardware problem.

First check if LSI 2307 firmware has release p20.0 up to p20.004
This is buggy with SSDs. Update then to 20.007

I would then check ram with memtest86

Then connect the disks to onboard Sata to ruleout the HBA

Have you called zpool status -v as root?
 

sorhol

n00b
Joined
Jul 29, 2014
Messages
16
These errors indicate a hardware problem.

First check if LSI 2307 firmware has release p20.0 up to p20.004
This is buggy with SSDs. Update then to 20.007

I would then check ram with memtest86

Then connect the disks to onboard Sata to ruleout the HBA

Have you called zpool status -v as root?

Hi Gea,

I was already on 20.007
I learned that my memory sticks was not installed in the correct order, so I changed it.
I ran memtest86 for 16+ hours without any errors.
So I recreated the pool and tested and immediately I got the errors again :mad:

Fortunately I had a spare HBA, so I replaced the HBA and recreated the pool.

After testing with heavy copying of files everything is now running without errors :)

Yes, I did run the zpool status -v as root

Thank you so much for helping out.
 
Last edited:

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
State of SMB3 development on Illumos(NexentaStor, OmniOS, OpenIndiana)
newest feature: SMB3 for ZFS Cluster/ HA failover (Yes, I really want that)

Implement SMB3 persistent handles
(part of making SMB "cluster aware")

Steps to Reproduce:
Connect an SMB3 client (Win2012 or later)
Restart the SMB service, or force a fail-over
Take a network capture (from the client is easiest)

Expected Results:
SMB3 client should reclaim it's CA handles after the restart or fail-over.

Actual Results:
SMB3 client has to re-establish it's open handles (as seen in the network capture)


Search - illumos
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
SMB3 (kernelbased Solarish SMB server) is announced for OmniOS 151032 in November 2019,
see omniosorg/Lobby

If you want to try it now, use OpenIndiana (always newest Illumos) or OmniOS bloody 151031
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
99
Hi _Gea
I am running OmniOS 151022 with Napp-It 19.01c1. I believe OmniOS should feature SMB 2.1.
Ever since I upgraded my Windows client to Windows 10 v1903, I cannot play any video file from my NAS anymore using VLC Player. I have an open thread in their forum here where everything is described in detail, but so far I got only hints, but couldn't solve the issue. Maybe you've got an idea whether this has to do with SMB mismatch? I tried enabling SMB1, but it didn't help either. Any hint would be much appreciated.
Thanks!
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
OmniOS stable supports SMB 2.1. Current Illumos/OI/OmniOS bloody 151031 support SMB 3.0.

Is this a VLC player problem only and you can access OmniOS via Windows file explorer?

As I do not asume that you have restricted OmniOS to SMB 1,
It may possible that your Windows demands SMB3. If so allow SMB 2.1

https://support.microsoft.com/en-us...bv1-smbv2-smbv3-in-windows-and-windows-server


other Option
On your PC, execute "gpedit.msc" and on "Computer configuration > Administrative Templates > LANMAN Workstation > Enable guest session not secure" enabled this.
 
Last edited:

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
99
Thanks for your quick reply.
Yes it's only VLC Player. I can do anything on these shares using Windows Explorer. I can play the video files using Windows Media Player. Only VLC is not playing them.
I have restricted Windows according to the article (powershell and "activate features" method), but it did not help. I also tried allowing guest sessions not secure, to no avail.
When I do get-smbconnection, it says the dialect used is SMB 2.1. Is Omnios 151022 supporting this? I am not on 151028 yet, should I upgrade?
Thanks!
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
OmniOS supports SMB 2.1 since v151018

If you want to upgrade
- create a BE to be able to go back
- update napp-it to newest free or Pro
- update to 151030 (current stable/LTS)

optionally afterwards
- create a BE
- update to 151031 bloody (SMB3)

You can update 151030 - > bloody -> next stable 151032 (november 2019)
http://www.napp-it.org/doc/downloads/setup_napp-it_os.pdf
 

Nemesis_001

Weaksauce
Joined
Apr 3, 2011
Messages
69
Hi,

Currently running an all in one system with esxi 6.7u1, omnios r151028 with one raidz2 array. The data is shared via smb, another win10 vm handles all the tasks I need and is stored on an iscsi which is passed as rdm.


I wanted to update to r151030. My current system is a little long in the tooth. Initially installed r151014 and upgraded up to this point.

I have seen the recommendations to do a clean install when moving to r151030. If i export the pool and re import it on a fresh install, will it retain all the settings, permissions and iscsi lun mapping so I can just import it and be up and running in no time?

I've also noticed there is no ova for r151030 atm. Is it suggested to do a clean install, use the current ova and upgrade it, or just wait for r151032 which might bring an updated ova?

I am not in any hurry to perform the upgrade.

Thanks
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
I would first try an update. Most problems are between 026 and 028 due the move from Sun SSH to OpenSSH.
http://www.napp-it.org/doc/downloads/setup_napp-it_os.pdf

New setup
Import the pool.
To keep napp-it settings, you must save/restore /var/web-gui/_log/*
Permissions are file based so they are there after an import. You only need to re-create all users (and groups, smb groups) with same uid.

Locical units based on zvols must be re-imported. If you have created them in menu ZFS filesystems, you can simply re-activate as the guid is part of the zvol name. You must re-create targets, target groups and target portal groups. In newest napp-it pro/dev you find an option for a full backup/restore of all Comstar settings.

If you create and run a backup job, you can restore all settings via menu User > Restore (require newest napp-it pro/dev with a backup job started also from newest napp-it pro/dev)
 

Nemesis_001

Weaksauce
Joined
Apr 3, 2011
Messages
69
Hi Gea,

Thanks.
I'll try an in place upgrade then firstly and fall back on full re-install if something doesn't work well.
I have the pro version, so I will try the built in mechanisms if re-install is needed.

I think I created the LU using the COMSTAR menu. How can I determine?

Thanks,
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
I think I created the LU using the COMSTAR menu. How can I determine?

Thanks,

If you create the LUN in menu ZFS filesystems as a sharing option for a filesystem, it is shown there under iSCSI. If not you only see it in menu Comstar.
 

Nemesis_001

Weaksauce
Joined
Apr 3, 2011
Messages
69
Ok, looks like it's shared from the filesystem, so I'll need to import it.
I just need to record the path, and re-import it after the pool as follows?
sbdadm import-lu /dev/zvol/rdsk/[Volumename]/[sharename]

I noticed the backup job supposedly backs up users and comstar settings, so supposedly since I have the pro version, if I need to re-install I just import the pool, import the zvol, run a restore, refresh storage on ESXI, then I am good to go as if nothing happened?
If so, that is really pretty painless.

Thanks again,
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
If you have updated to newest pro/dev you can do a full Comstar backup in menu Comstar. Then start a backup job in menu Jobs. Some backup/restore options require the newest pro/dev.

You can then restore user and napp-it settings in menu Users or do a full Comstar restore (zvols, logical units, targets, target groups, target portal groups, views) in menu Comstar.

If you have enabled iSCSI sharing in menu ZFS filesystems without special Comstar settings and you want to just re-enable the LUN again after an import, you can simply enable it again in menu ZFS filesystems.
 

Nemesis_001

Weaksauce
Joined
Apr 3, 2011
Messages
69
If you have updated to newest pro/dev you can do a full Comstar backup in menu Comstar. Then start a backup job in menu Jobs. Some backup/restore options require the newest pro/dev.

You can then restore user and napp-it settings in menu Users or do a full Comstar restore (zvols, logical units, targets, target groups, target portal groups, views) in menu Comstar.

If you have enabled iSCSI sharing in menu ZFS filesystems without special Comstar settings and you want to just re-enable the LUN again after an import, you can simply enable it again in menu ZFS filesystems.

I am positive that I have created a target and target group in the comstar menu after creating the zvol.

By the way, is there any way nowadays to have esxi wait for omnios to load, the rescan hbas and then launch the guest OS?

I really love my setup, it has been stable as enterprise systems. The only annoying thing is after a reboot I have to manually scan the hbas, refresh storage and then boot the OS.

If there is a long power outage and I am not home, i am getting an angry call.
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
In an All in One setup, only the storage VM is on local datastore, all others are on ZFS and not available on ESXi power up.

If you use NFS from OmniOS to store other VMs, then ESXi auto reconnects the NFS share when it becomes available
after OmniOS is up. You only need to configure autostart order and delays for an auto up of OmniOS and all other VMs.

If you use iSCSi to store the VMs, you must re-connect the LUNs manually after a power up of ESXi and OmniOS.
As NFS is much easier to handle, has quite the same performance than iSCSI (with same sync setting)
I would always use NFS over iSCSI as long there is no special other reason for iSCSI.
 

Nemesis_001

Weaksauce
Joined
Apr 3, 2011
Messages
69
When I build the system several years ago, I initially tried NFS with sync / async setting.
Sync was terribly slow, and async ISCSI still fared way better on my system that async NFS.

Furthermore, if I am not mistaken, when using ISCSI, you can place the VM metadata on a SYNC NFS datastore while the OS itself is on an ASYNC ISCSI LUN, so this way I at least get some safety for the VM metadata.
Did I get something wrong?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
Main difference regarding performance between NFS and iSCSI is the default behaviour (ZFS setting sync=default)

ESXi + NFS enables then sync write while ESXi + iSCSi disables sync write.
On iSCSi sync behavior is set with writebackcache setting on a logical unit or can be forced on the underlying zvol with sync=always or disable.

This often leads to the misanderstanding that iSCSI is faster than NFS while the main difference is the different default sync behaviour.
If you force a sync setting like always or disabled on the NFS filesystem and the zvol for the logical unit, performance should be quite similar.

last
Keep it simple. Do not mix NFS and iSCSI for a VM.
If you only want performance, just disable sync. If you want security for VMs, force sync on any protocol.
If you want performance and security, add an Slog (Intel Optane up from 800P, WD DC SS530 etc)
 

Nemesis_001

Weaksauce
Joined
Apr 3, 2011
Messages
69
I still had noticeable performance variation even when NFS had sync=disabled.
One of the jobs of the guest VM is to download some multi part files, repair them and move them to a SMB share on the storage VM (which is the same pool as the guest VM).
Doing this on NFS with sync disabled took almost 3 times longer for identical files, and the guest VM became quite unresponsive. That's why I moved to iSCSI.

I can give it a go again to see if something changed after upgrading everything.
I have a 32GB Optane cache drive, will it not do as a SLOG device?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
The 16/32 GB Optane is more like a good Sata SSD, far below the performance of the Optane up from 800P. For a slow disk pool it can still help a lot.
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
99
Hi _Gea
Tried to upgrade my box, however, I get certificate errors which I cannot solve, such as this one:
root@NAS1:~# pkg update

pkg update: The certificate which issued this certificate: /C=GB/ST=England/L=South Yorkshire/O=Citrus IT Limited/OU=Engineering/CN=Andy Fiddaman/emailAddress=omnios@citrus-it.co.uk could not be found. The issuer is: /C=CH/L=Olten/O=OmniOS Community Edition/CN=OmniOSce Key Master/emailAddress=ca@omniosce.org
The package involved is pkg://omnios/locale/gu@0.5.11,5.11-0.151026:20180420T093144Z
root@NAS1:~#
Can you please help me getting past this?
I try to upgrade from OmniOS r151022 to r151026 (for above message), I also tried going to r151024 or r151028, all resulting with the same type of message, but for different certificates. I tried following your instructions on napp-it.org, but to no avail.
I'm updating this same installation since r151014, so I know it's probably best to start over with a fresh install, however, I am even more unsure how to do this without losing all my data and configuration.
Any help is much appreciated!
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
3,958
An update from OmniOS 151022 (OmniTi) to 151022 OmniOSce requires that you first install the certificate from OmniOSce
see page 7, http://www.napp-it.org/doc/downloads/setup_napp-it_os.pdf

If you want to do a fresh install of OmniOS 151030 lts:
- backup /var/web-gui/_log/* (napp-it settings)
- write down users with their uid and gid
- write down SMB groups and memberships

- install OmniOS 151030 lts (if you use a new disk, you can go back)
- install napp-it per wget
- restore /var/web-gui/_log/*
- re-create users with old uid and SMB groups

Backup/Restore With newest napp-it Pro:
- Run a backup job
- Restore settings and users with menu User > Restore

For Comstar (Pro):
Run a Comstar > Full Backup (backup settings to /var/web-gui/_log/ so run prior a backup of this folder)
Restore via Comstar > Full Restore
this allows a hot backup/restore of all settings like zvol, lu, targets, target groups, target portal groups and views


Import data-pool
For this, only the disks must be available (no need to backup raid-settings)
All your data and permissions remain intact.

It you are ok with the newest OmniOS, upgrade pool (menu pool, click on version) to enable newest features.
 

Captainquark

Weaksauce
Joined
Dec 14, 2011
Messages
99
Thanks for the hint, so I was able to install the certificate according to your manual. Then I proceeded following the same document to upgrade to r151024, then r151026. Both went through without errors, but I wondered when my system rebooted, it always came up with r151022, apparently - although I updated with i.e.
pkg update -f -r --be-name=r151026
then rebooted with "reboot now", which it did properly, as far as I can tell.
When it came back up, I logged in and it showed immediately that I was working with:
Last login: Tue Sep 3 21:15:27 2019 from 192.168.1.122
OmniOS 5.11 omnios-r151022-83bf12a06a May 2019

New LTS release 'r151030' available.
See https://omniosce.org/upgrade for details on how to upgrade.
Whereas in beadm list, it shows:
r151024 - - 1.38G static 2019-09-03 20:17
b4upgr151022-backup-1 - - 72.0K static 2019-09-03 21:06
b4upgr151022-backup-2 - - 72.0K static 2019-09-03 21:06
b4upgr151022-1 - - 600K static 2019-09-03 21:09
r151026 NR / 7.78G static 2019-09-03 21:21
So active and set on reboot is r151026...?
Cat /etc/release says I am working with " OmniOS v11 r151022dj". Napp-it, btw, says the same:
running on : SunOS NAS1 5.11 omnios-r151022-83bf12a06a i86pc i386 i86pc
OmniOS v11 r151022dj
Can you tell me what I am doing wrong?
 
Top