OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Please may I request that S99napp-it stopcmd lock/offline any "encrypted pool on files" (pool oef) which napp-it is managing. I've had pools corrupted unexpectedly in nut power events which execute SHUTDOWNCMD and also when executing System -> Restart napp-it after updating /var/web-gui/_log/mini_httpd.pem with new SSL certificates.

Luckily, I had snaps to rescue the pool files, but I would greatly prefer if they weren't so at risk for corruption.

This is new to me, needs some investigations
 
Hello. I have the following notes from my experience with napp-it's SSL certificate:

* Restart Web-UI displays "...https server at port 82 or 443 with your certificate file at /var/web-gui/_my/mini_httpd.pem" but S99napp-it actually looks in /var/web-gui/_log for the PEM
* At first I placed the certificate at /var/web-gui/_log/mini_httpd.pem and secured it, but agent-bootinit.pl chmods the entirety of /var/web-gui back to world-readable 0755
* Symlinking a /var/web-gui/_log/mini_httpd.pem to a root-secured file in an alternate location works and stays secure after a napp-it restart
 
Hello. I have the following notes from my experience with napp-it's SSL certificate:

* Restart Web-UI displays "...https server at port 82 or 443 with your certificate file at /var/web-gui/_my/mini_httpd.pem" but S99napp-it actually looks in /var/web-gui/_log for the PEM
* At first I placed the certificate at /var/web-gui/_log/mini_httpd.pem and secured it, but agent-bootinit.pl chmods the entirety of /var/web-gui back to world-readable 0755
* Symlinking a /var/web-gui/_log/mini_httpd.pem to a root-secured file in an alternate location works and stays secure after a napp-it restart

Thanks,
will care about in next release
 
Realtimemonitor and Shorttermmonitor do not work for me on the https napp-it. Not sure if I've mis-configured things. Any idea?
 
I see a way to create a job to email me if disks fail. Is there anything like that for pools being degraded or whatever? Thanks!
 
Realtimemonitor and Shorttermmonitor do not work for me on the https napp-it. Not sure if I've mis-configured things. Any idea?

https (minihttpd) + realtime graphs via websockets (Mojolicious) is currently not possibly
 
Hey Gea, I'm on 16.03 Pro with the Monitor extension. Is there a way to view long-term L2ARC usage stats? I want to see if my L2ARC is being used at all. I've been reading a lot that L2ARC is often a waste. When Should I Buy Readzilla/L2ARC? (Oracle Storage Ops)

Currently, only shorttime logs (last minute) are implemented.
You may create an "other job" that calls arcstat.pl every 5 minutes and adds the result to a textfile
(or add the napp-it values from /tmp to a file)

illumos-gate/arcstat.pl at master · illumos/illumos-gate · GitHub
 
I see a way to create a job to email me if disks fail. Is there anything like that for pools being degraded or whatever? Thanks!

A "disk fail" is reported from a zpool status so napp-it alerts are basically initiated by a pool error.
 
A "disk fail" is reported from a zpool status so napp-it alerts are basically initiated by a pool error.

Okay, thanks. It wasn't clear to me if 'disk error' meant literally a disk failure or SMART errors or whatever...
 
I've so had it with this CIFS issue that I'm nearly prepared to switch. This is annoying. For whatever reason CIFS seems to begin ignore or not forcing TCP buffer values after a while.
 
I have seen this discussion at the OmniOS-discuss maillist.
Hope that OmniTi (or any Illumos developer, as this is an Illumos issue then) can add some comments there
 
I'm the one who has been posting to the list :) I've confirmed the behavior through the use of Dtrace. Would I be better off posting to illumos? Should I just file a bug under illumos?
 
May I request that Realtimemonitor -> Zilstat report have an option to report statistics for pools separately?

The current monitor is showing me combined statistics for rpool and for my storage pool, but I would be interested to see separate statistics—particularly for the latter.
 
A ZIL is only used for sync write.
What application are you using on rpool that requests sync write?
Usually there is none so even with sync=default there is no sync write.
Only a rpool setting like sync=always would force sync write on rpool.

Usually you use an Slog as ZIL that is assigned to a single pool.
So your zilstat is for your datapool only

If you have several datapools, you can use sync=default or always for the pool
that you want to monitor and sync=disabled for the others
 
A ZIL is only used for sync write.
What application are you using on rpool that requests sync write?

It's not I/O I'm doing explicitly, and it's not a large amount—but it was confusingly non-zero such that I made my request for separate zilstat.

Using the SYNC options defined in illumos-gate/fcntl.h at master · illumos/illumos-gate · GitHub, and running the following dtrace command to show programs opening files with a SYNC option:

Code:
sudo dtrace -qn 'syscall::open*:entry /
arg1 & 0x10 != 0 || arg1 & 0x40 != 0 || arg1 & 0x8000 != 0 /
{ printf("%s: %s %d %s %d\n", probefunc, copyinstr(arg0), arg1, execname, pid);
trace(curpsinfo->pr_psargs); }'

I see the following output:

open64: /tmp/zilstat.log 769 perl 1255
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl zilstatopen64: /tmp/prstat.log 769 perl 1254
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl prstatopen64: /tmp/prstat-raw.log 769 perl 1254
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl prstatopen64: /tmp/nicstat.log 769 perl 1252
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl nicstatopen64: /tmp/nicstat-raw.log 769 perl 1252
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl nicstatopen64: /tmp/arcstat.log 769 perl 1249
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl arcstatopen64: /dev/null 769 sleep 27073
sleep 2open64: /tmp/fsstat.log 769 perl 1251
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl fsstatopen64: /tmp/fsstat-raw.log 769 perl 1251
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl fsstatopen64: /tmp/nicstat.log 769 perl 1252
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl nicstatopen64: /tmp/nicstat-raw.log 769 perl 1252
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl nicstatopen64: /tmp/iostat.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat_60s.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat_60s.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat_60s.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat_60s.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /tmp/iostat_60s.log 769 perl 1250
perl /var/web-gui/data/napp-it/zfsos/_lib/illumos/agent_stat.pl iostatopen64: /dev/null 769 sleep 27075​
 
These are the monitoring scripts that do file locking for concurrent access when writing logs.
They write to the tmp filesystem (RAM) not rpool or datapool.
 
Info:

I have received today a SuperMicro X11SSH-CTF
Supermicro | Products | Motherboards | Xeon® Boards | X11SSH-CTF

This mainboard is very new and I am propably one of the first with this mainboard in Germany.
As it offers IPMI, up to 64 GB ECC RAM, 8 x Sata and optionally 2 x 10G ethernet and 8 x Sata/12G SAS it may become the default suggestion for an entry level up to midrange ZFS and All-In-One server.

I ordered the top version with 10G nic and 12G SAS with 16 GB ECC RAM and the G400 CPU.
The G4400 is a lowcost Dualcore CPU with ECC and vt-d support at about 50 Euro.
The mainboard with 16GB ECC and the G4400 was at 700 Euro.

First impression
- Installation of ESXi 6.0u2 failed (nic not supported, was expected)
After inserting an Intel X540, everything seems ok

- Installation of OmniOS/Solaris from an USB stick failed

Installation of OmniOS is possible when you
- install from a Sata CD drive (USB CD drive failed)
but you must use the following bios settings

Advanced > Support Windows 7 USB setup: enable
Boot > Legacy mode

This is important as these new boards do not offer a ps/2 keyboard option.
The 10G nic X550 is also not supported at the moment with OmniOS/Solaris.
 
HI,
im currently setting up my own all in one server and was wondering if you guys could help me.
I already got my server hardware together.
An old 16 bay Supermicro case.
An X9 Supermicro motherboard.
2x Intel Xeon E5-2670
128GB ECC Buffered Ram
And an LSI 9211-8i HBA flashed to IT-Mode

I only have 5x WD Red 4TB drives and 3x 80GB Intel DC SSDs. (I know i properly shouldnt use WD REDs)

About ~1200 EUR.
Anything against this config?

But my my actual question is what OS should i use for my ZFS Pools.
Im running ESXi 6 (or maybe 5.5) as superviser.
I was thinking about running Oracle Solaris 11 as (free) education/testing version but what about the update policy on the test version?
My second thought was to use OmniOS which always looked better then OpenIndiana atleast on paper. Any thoughts/experiences?
Both OSs should run napp-it.

And my last question whould be if its ok to run the five RED drives as RAIDZ-2 via NFS to ESXi for general mass storage and one VM to create a NAS.
I would buy another SSD and run the SSDs as SoftRaid 10 via iSCSI for the VMs? (iSCSI for low latency?)

Hope someone could help me.
 
wow!

If this is a home or lab setup, you have an enormous computing power with 16 cores and a huge amount of RAM, propably for the price of > 200w on idle. So the first question would be, what do you want to do with that server?

For a home or lab setup, I would probably deinstall or sell one CPU and half of the RAM and
- add another wd red to build a raid-z2 from 6 disks and use it as a general use SMB filer and for backups
- create a second SSD only pool for VMs. You do not need a Raid-10, a mirror or Raid-Z is ok as iops is not a problem with SSDs.
Use NFS to provide a datastore for ESXi (as fast as iSCSI but more simple with SMB access for copy/clone/backup)

You can use Solaris if you need encryption. The other improvement of Solaris is a faster resilvering.
Beside that OmniOS is a better option as it offers regular updates. With Solaris free, you must wait for 11.4 on any bug

btw
you need a small Sata disk as a local datastore where you put the storage VM and optionally ESXi onto, ex a small SSD.
As you pass-through your LSI, you have only 8 ports for storage. With 6 disks you have 2 ports left for an SSD mirror or
you need a second controller.

Home option with ESXi6
Put ESXi onto an USB stick and create a datastore on USB.
You can then try to pass-through Sata as well - as said only for a home setup, a second LSI is a better solution.
 
Wow, thank you for the reply.

So the first question would be, what do you want to do with that server?
The server should do a bit more than managing storage. Some general service task like firewall, git, mail, printer services. Also i was planning of running remote build server and maybe some game server.

Use NFS to provide a datastore for ESXi (as fast as iSCSI but more simple with SMB access for copy/clone/backup)
So i use NFS instead of iSCSI to give ESXi acess to (all) the data and SMB would run for example on OmniOS to provice file acess to all my devices?

You can use Solaris if you need encryption. The other improvement of Solaris is a faster resilvering.
Beside that OmniOS is a better option as it offers regular updates
So i go with OmniOS and encypt via napp-it and lofi? Would that be enough security for "home use"?

you need a small Sata disk as a local datastore where you put the storage VM and optionally ESXi onto, ex a small SSD.
As you pass-through your LSI, you have only 8 ports for storage.
I was hoping i could passthough the two SAS Ports of the onboard C602 chipset but im guessing i go for a USB stick and later maybe for a second HBA.
 
NFS vs iSCSI
Keep it simple is the first rule.
Unless you need a special iSCSI feature, this means NFS


Encryption
Solaris is the only option with ZFS encryption.
This means encryption is a ZFS property. You can decide on each filesystem if you want to encrypt and you can use a different key on each.

All other options (BSD, Illumos, Linux) use encrypted disks, volumes or files below ZFS where ZFS is not aware or involved in encryption. The Lofi option has its own advantages as it allows to backup encrypted files with full raid-Z redundancy and security to unsecure places or filesystems like a cloud or external disks/ fat usb sticks and disadvantages as it is slowerr than other options and not suited for large arrays.


Passthrough
You have three options to provide disks for a storage VM
- pass-through, means that you pass a whole pci-e device to the VM = barebone alike, best of all method
- physical raw disk mapping means that you give exclusive access to single disks for a VM over the ESXi disk driver, complicated but might be ok
- ESXi virtual disks means that you have an ESXi vmdk on an ESXi vmfs filesystem that you can offer to a VM = ZFS on vmfs, not suggested for a filer
 
Last edited:
Hey there!

I was wondering what is currently the recommended version of LSI SAS2 firmware? Some say P15, other P18 and some P19. What do you use? Anyone updated to the updated (??) P20?:)

Is anyone having problems with iSCSI being randomly disconnected (I notice this mostly with Windows clients). Every now and then Windows report that it coult not send iSCSI PDU (event IDs 7, 39, 34) and it forces a reconnection with target. Sometimes the drive drops and disconnect/connect or client reboot is necessary. I notice this in Windows 10 and Win Server 2k12R2 (didn't test others). This happens on clients that are on the local network and on those connected over WAN.
 
Hey there!

I was wondering what is currently the recommended version of LSI SAS2 firmware? Some say P15, other P18 and some P19. What do you use? Anyone updated to the updated (??) P20?:)
I used the P20 version but i didnt test it that much. Seems to work through. And looking at the date of the fireware upload they didnt update it in some time so there shouldnt be so many problems with this version.
 
Last edited:
I guess I will have to do some tests on my own. I want to fully test it before I deploy it to a production SAN. Don't want any troubles with 400TB of data, it takes ages to restore:)

Matej
 
I used the P20 version but i didnt test it that much. Seems to work through. And looking at the data of the fireware upload they didnt update it in some time so there shouldnt be so many problems with this version.

Does anybody else have information to share about the latest incarnation of the P20 firmware ?

The current released version is 20.00.07.

The P20 problems were initially reported back in December 2014, so I wonder if LSI has resolved them in the meantime ?

I'm currently running an all in one on ESXi 5.1-U1 with OmniOS 151006 as the storage server using 3 x M1015 with P14 IT firmware in passthrough mode for ZFS.
As you can see the versions of all components are pretty old but the system runs so stable that I didn't want to change anything for a long time.
Now I'm planning to upgrade to ESXi 6.0-U2 and OmniOS 151014 LTS and wonder if it would also make sense to upgrade the LSI firmware on the HBAs at the same time ?

Does anybody know about major benefits of the latest P20 or P19 firmware over the old P14 when using a IBM M1015 flashed to IT mode ?
It's hard to gather all the change logs from P14 to P20 and make a clue of some of the technical terms.
In essence I would like to understand if anything important was introduced or fixed in one of the fimware versions after P14 making the M1015 more stable, more performant or did add support for an important feature ?
 
I was just reading an article about a new type of Ransomware called Samsa

- after infection it scans your whole it infrastructure and for backup procedures
and distributes itself over the network without any noticeable action

after some time when it knows your files, disks, storage and backup procedures,
It can happen, that it first encrypts all backups ( waits until all backups are encrypted !!)
without any other actions.

it starts then encryption simultatiously on all systems
it deletes shadow copies (when using Windows)

So it can happen, that
- all files on many systems are encrypted one morning
- Windows snaps/ shadow copies are deleted
- backup is encrypted even on multiple rotating backup medias


The question is now, is a remote ZFS storage safe against such attacks
- current data: can be encrypted as long as you have permissions : not safe
- ZFS snaps are safe as they are readonly and cannot be deleted by a client admin user
as you need local root access on the storage system
 
The question is now, is a remote ZFS storage safe against such attacks
- current data: can be encrypted as long as you have permissions : not safe
- ZFS snaps are safe as they are readonly and cannot be deleted by a client admin user
as you need local root access on the storage system

So you can't delete Volume shadow copies and with it snapshots?
Only local root user can delete snapshots?

Matej
 
Shadow Copies are Windows snaps.
If an attacker can gain Windows admin permissions or with the help of a Windows or application bug ex Flash, he can destroy these Shadow Copies ex on a Windows server.
This is part of a Ransomware attack.

ZFS snaps on Unix is a different thing.
There is no way for a Windows admin to destroy these snaps over SMB or another Windows remote mechanism.
Only a local root account on Unix is able to destroy them, either via SSH or other Unix admin options.
 
Last edited:
Or, if you want to use the mountpoint as a parent property to be inherited, canmount=off.
 
Starting to rebuild my ZFS filer and I'm faced again with the decision on ZFS versions again? I've been having a hard time finding any real discussions on ZFS versions and it's real implications. Does version 28 really matter anymore? Especially now with the implementation of OpenZFS, it seems like pretty much all of the active projects attempt to duplicate the feature flags. Since, the only advantage I see now with version 28 is the ability to go back to Oracle Solaris.

Just wanted to know if I'm missing something here. Since, I'm leaning towards just going with 5000 and feature flags.

Thanks in advance
 
The only advantage of Pool v28/ ZFS v5 is compatibility of OpenZFS with Oracle Solaris.

Main disadvantage is that you cannot use newer ZFS features like encryption, LZ4 and faster resilvering (Solaris > v28)
ornewer OpenZFS features from ZFS v5000 see Features - OpenZFS

As the newer features of either Oracle Solaris or OpenZFS are highly wanted, you should use the newer versions.
 
Back
Top