OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

HammerSandwich · Dec 1, 2015

_Gea said:
- block size must be 512 but I have not tested if blocksize=512 of a volume or LU
is enough or if the physical blocksize of the underlying pool/vdev must be ashift=9

It's all in the iSCSI. A pool with ashift=12 worked fine when I tested it.

natkin · Dec 1, 2015

I have an intermittent problem with napp-it since the security re-vamp (from early last summer?).

About once a week I will see the repeated error "wrong ip, please login again." It will happen several times, forcing me to log in again and again. My server has a stable IPv4 and an addrconf IPv6.

I would appreciate any advice.

Thank you.

Eschertias · Dec 2, 2015

Question time! I have a series of questions regarding my home brew OmniOS Hobo-SAN that I'd like help with. Most helpful set of answers, or really in depth answers get an Amazon gift card.

Background Info: My current setup is an OmniOS install on a whitebox supermicro storage chassis, LSI SAS2 HBA with IT firmware, with Napp-IT acting as the management interface.
My two pools are a 10 disk RAIDZ2 made up of 3TB WD REDs, and a 10 disk RAIDZ2 made up of 4TB WD REDs.

These two large pools, Tank and Bus are used with COMSTAR to store iSCSI data. I provisioned a volume that used about 87-90% of the space in each pool. These volumes were used as LUs and shared out via iSCSI. The use case is for bitlocker'd windows SMB3.0 shares that serve multimedia data to 3-5 users, and act as a general repository for all my digital pack-rat needs. The iSCSI LUNs are backed up to tape ~quarterly.

Question 1) After about a year, the iSCSI volumes mounted on my windows machine are starting to get a bit full, and the performance is dropping. Is there a way to defragment the LU or otherwise restore performance without blowing out the array and rebuilding everything from backup?

Question 2) If I do blow out the whole thing and restore from tapes, would I be best served keeping them as separate pools with a single vdev, or would it be best to combine them into one larger pool with two vdevs? I know that would double the speed and IOPS on the pool, but is there a downside to mixing vdevs of different ages and sizes, so long as the level stays the same?

Question 3) I noticed that OmniOS and Napp-IT now have sweet, sweet NVMe support, so I picked up an Intel 750 400GB card to act as combined L2ARC and ZIL for my pools. What is the best way to provision the card to act as a ZIL and L2ARC for both pools? Is this a bad idea?

_Gea · Dec 2, 2015

natkin said:
I have an intermittent problem with napp-it since the security re-vamp (from early last summer?).

About once a week I will see the repeated error "wrong ip, please login again." It will happen several times, forcing me to log in again and again. My server has a stable IPv4 and an addrconf IPv6.

I would appreciate any advice.

Thank you.

Until last summer a session id was valid until you logout.
If someone was aware he was able to take over a session.

Now a session id is different on every menu and bound to the ip interface.
There is also a timeout. This is a very restrictive and secure setting.
I was thinking to make this a setting (comfort vs security).

If you are always on the same computer, you can store the password
in your browser.

_Gea · Dec 2, 2015

Eschertias said:
Question time! I have a series of questions regarding my home brew OmniOS Hobo-SAN that I'd like help with. Most helpful set of answers, or really in depth answers get an Amazon gift card.

Background Info: My current setup is an OmniOS install on a whitebox supermicro storage chassis, LSI SAS2 HBA with IT firmware, with Napp-IT acting as the management interface.
My two pools are a 10 disk RAIDZ2 made up of 3TB WD REDs, and a 10 disk RAIDZ2 made up of 4TB WD REDs.

These two large pools, Tank and Bus are used with COMSTAR to store iSCSI data. I provisioned a volume that used about 87-90% of the space in each pool. These volumes were used as LUs and shared out via iSCSI. The use case is for bitlocker'd windows SMB3.0 shares that serve multimedia data to 3-5 users, and act as a general repository for all my digital pack-rat needs. The iSCSI LUNs are backed up to tape ~quarterly.

Question 1) After about a year, the iSCSI volumes mounted on my windows machine are starting to get a bit full, and the performance is dropping. Is there a way to defragment the LU or otherwise restore performance without blowing out the array and rebuilding everything from backup?

Question 2) If I do blow out the whole thing and restore from tapes, would I be best served keeping them as separate pools with a single vdev, or would it be best to combine them into one larger pool with two vdevs? I know that would double the speed and IOPS on the pool, but is there a downside to mixing vdevs of different ages and sizes, so long as the level stays the same?

Question 3) I noticed that OmniOS and Napp-IT now have sweet, sweet NVMe support, so I picked up an Intel 750 400GB card to act as combined L2ARC and ZIL for my pools. What is the best way to provision the card to act as a ZIL and L2ARC for both pools? Is this a bad idea?

q1
Performance of an CopyOnWrite filesystem like ZFS lowers fill fillrate
You should avoid to go aver say 80% when you need performance

q2
If your pool is unbalanced, some data may be only on one vdev and may be slower

q3
A ZIL is used when you have disabled writeback cache for your LU.
The size of a ZIL must not exceed 10 GB

If you partition a (new) 400 GB SSD, you can use
10-10 GB for ZIL
200-300 GB for L2Arc

I would keep the rest unused to keep write values high

btw
L2ARC caching is per default only enabled for random data not for
sequential data like media files. You can enable this with set zfs:l2arc_noprefetch=0
https://storagetuning.wordpress.com/2011/12/01/zfs-tuning-for-ssds/

enigmah · Dec 2, 2015

enigmah said:
This is a real beginner question but I don't know what search terms I should look for:

How can I skip the syslog messages which are written to the console to get a clean prompt?
Cause at the moment I'm stuck after the first message. I don't know what keyboard shortcut clears the screens to give me back the prompt to enter commands. Nowadays I have to restart the machine by ssh or the web interface to enter commands by IPMI.

THX

I've found the error. It's a bug in the IPMI Keyboard which get's stuck after the first syslog message.
By plugging in a physical keyboard (video still IPMI console) I was able to type in the command clear and everything worked.
A work around is to execute in the IPMI console Options -> Keyboard Mouse Hotplug. After that the IPMI keyboard will work again.

System: Supermicro A1SRi-2758F, latest IPMI Firmware 1.76 , omnios r151014

TheNetworkGuy · Dec 2, 2015

Thanks for the suggestion _Gea to use NFS.
I have not used NFS in my travels thus far, and it provides a good opportunity to learn something new.

I created a new Nappit installation, and did the following:
pools --> create pool
Used RaidZ
Then i created a ZFS file system and turned SMB off,
Left all other options to default.
Went to the ZFS file system and clicked on NFS to enable it. Verified on the Services --> NFS page
NFS Server: Online
NFS Client: disabled.
ESXi mounts the NFS with no hassles. and I have created a VM on that share.

However, it is painfully slow!
Windows will begin to install, but the percentage meter takes a long time to increment.
Here is what seems to be going on in the system menu. They all jump around here and there.. but the disks seem to be saturated?
What seems to be my bottleneck?

Eschertias · Dec 2, 2015

TheNetworkGuy said:
What seems to be my bottleneck?

NFS by default uses sync writes, there should be an option to disable them to test if that's the case. If you still want sync writes, get a flash based disk and use it as the pool's ZIL.

_Gea said:
q2
If your pool is unbalanced, some data may be only on one vdev and may be slower

But if I destroyed both pools and recreated them as one big pool with 2 vdevs, I would avoid the unbalanced pool issue, correct?

TheNetworkGuy · Dec 2, 2015

Eschertias said:
NFS by default uses sync writes, there should be an option to disable them to test if that's the case. If you still want sync writes, get a flash based disk and use it as the pool's ZIL.

Perfect!
Diabled the write sync and noticed the performance increase.

Also upped the RAM to 16GB.
Here is what the system now looks like in near idle conditions:

Something I noticed while installing windows, the HDD light on the front of the computer was quite busy, but the SAN sort of just flashed every now and again.
Wondering if it stores alot in the RAM?

_Gea · Dec 3, 2015

Eschertias said:
But if I destroyed both pools and recreated them as one big pool with 2 vdevs, I would avoid the unbalanced pool issue, correct?

A pool is unbalanced when data is not spread equally over all vdevs. This happens if you add a new vdev where you need a copy action to rebalance.

In your case, when the pool is empty, data can be spread equally but with a higher fillrate the smaller vdebv gets full and some data is mainly on one vdev only.

While this may give a lower perfomance, quite often the performance of one vdev is enough. On my backupsystems I do the same as i regularly update a vdev with newer and larger disks to increase overall capacity.

nostradamus99 · Dec 6, 2015

I created an replication job a couple of weeks ago like this:

So each week I created a replication snapshot from 1 napp-it appliance to a 2nd appliance.
Everything was working beautiful up to last night:

Checked the System, Log, log-0 from the napp-it Menu nothing helpful.
Also nothing in /var/log..

Any ideas..?

_Gea · Dec 6, 2015

There are four logs with a different depth of information in the Jobs menu
- the above last 50 jobs
- the job history of a special job (click under joblog on replicate)
- the details of last run (click on date under Last)

The sender log (click on job remote on sender side)
and filter for send or a jobid

If you do not find a log that helps, a replcation problem
is mostly due a network problem, a pool full or a hanging
pool due a disk problem.

You can now
Retry the replication
optionally delete the newest snap on target side
to retry the replication with the former snappair.

nostradamus99 · Dec 6, 2015

This is the last log:

With the sender log:

I then copied 2 files to my source just to make sure that there was new data to replicate..ran the job

Now ran perfectly..glitch maybe...? (what if nothing changed since last snap...can this trigger a problem?)

Network, Pools, Disks all fine..!

_Gea · Dec 6, 2015

The relevant error message is failed to read from stream (from zfs receive).
This happens when the network connection interrupted or on problems of the zfs send process.
The result is that the replication fails and that a new base snap was not created.

Can happen from time to time for whatever reason.
Mostly you only need to rerun the replication based on the last snap.

Sometimes it happens that a process modifies snaps. In such a case,
you must delete the newest target snap manually to use the former pair.

If a replication fails completely ex while all needed snaps are deleted,
you can rename the target filesystem to redo the initial replication without
the risk of a dataloss.

And
Avoid recursive replications and prefer a job per filesystem.
Otherwise the replication fails completely if you create a new filesystem.

Gallager2014 · Dec 6, 2015

I am having a problem using NFS and to a lesser degree SMB/CIFS.

NFS:
I have NFS enabled via napp-it using <[email protected]/24 torrent/complete>. I am not able to connect from windows or linux machines.

CIFS/CMB
I have SMB enabled via napp-it as well. I am able to connect via windows and linux. However when I connect via cli using the mount command in Ubuntu, I can only write files using sudo. But when I connect via the gui using "Connect to server", I can write. How do I get the same ability via cli?

All connections are specified using IP Addresses and not hostname. I also want to prevent guest/anon access, which I think I have setup correctly.

_Gea · Dec 7, 2015

NFS:
Does it work, when you simply set NFS to on (with an open file/folder permission setting like everyone@=modify)

If you need to control traffic, you can use the firewall what is more effective than any fakable NFS setting

Gallager2014 · Dec 7, 2015

I resolved my CIFS/SMB issue, but NFS is still my preferred method.

When I set NFS to on, and attempt to set "everyone@=modify" under the ACL on folders page, I get a warning message "chown: WARNING: can't access torrent/complete".

Permissions at that mountpoint are 777+. 777 is not required though, correct?

_Gea · Dec 8, 2015

If you set Unix permissions, you mess up ACL inheritance settings
so a mixed use SMB/NFS3 is only a good idea if you can allow a fully open setting.

Maybe you start with resetting ACLs recursively to a root=full, everyone=modify.
ex in menu ZFS filesystems > Folder-ACL > reset ACL (below ACL listing)

Gallager2014 · Dec 8, 2015

For security, I have MAC filtering on the main router/firewall, combined with further filtering on the SAN/NAS VM. That should be good enough for now.

I reset the ACL to modify, with recursion. I still cannot mount the NFS share on linux or windows.

_Gea · Dec 9, 2015

do you have a Mac around where you simply can enter in Finder
Goto > Connect to server > nfs://ip of your server/pool/filesystem ex
nfs://192.168.1.1/torrent/complete

balance101 · Dec 10, 2015

revaluation ram to hdd ratio 1000:1

As I'm in the process of updating my zfs. I have been running zfs on 16GB ram for my 10x2TB and it has been great for my needs. I will be adding 10 x 5TB drives and moving the nas to 24 bay supermicro chassis. With 10x5TB, using 1000:1 rule, i should have 50GB of ram, that is more than my motherboard can support right now (4 x 8 = 32gb). Do I need to look into a dual socket 1366 motherboard? (which is quite affordable) So in the future if I want to populate 24 bays with 5tb (120TB) I would need 120 GB of ram to hit that 1000:1? 8gb per stick of ram is still affordable but the 16gb pe stick is out of my price range. As ram price now, most I see myself buying is 96GB with an old 1366 motherboard available.

TCM2 · Dec 10, 2015

Stop perpetuating myths. There is nothing like a 1000:1 "rule" (for normal, non-dedup usage).

You can run that array with 8GB RAM just fine, 16GB is plenty. More RAM will mean more data stays cached, but there is no requirement like that for it.

halcyon · Dec 10, 2015

balance101 said:
revaluation ram to hdd ratio 1000:1

As I'm in the process of updating my zfs. I have been running zfs on 16GB ram for my 10x2TB and it has been great for my needs. I will be adding 10 x 5TB drives and moving the nas to 24 bay supermicro chassis. With 10x5TB, using 1000:1 rule, i should have 50GB of ram, that is more than my motherboard can support right now (4 x 8 = 32gb). Do I need to look into a dual socket 1366 motherboard? (which is quite affordable) So in the future if I want to populate 24 bays with 5tb (120TB) I would need 120 GB of ram to hit that 1000:1? 8gb per stick of ram is still affordable but the 16gb pe stick is out of my price range. As ram price now, most I see myself buying is 96GB with an old 1366 motherboard available.

Its more about what the active set of data is, and how many active users you have at any time is. You can have an enormous array with little ram if you rarely access anything.

_Gea · Dec 10, 2015

RAM requirement of Solaris is 2 GB, does not matter of poolsize or dedup.
If you only have 2 GB, you are limited to pure disk performance or if you enable
dedup a simple snap destroy can last a week. So dedup on low RAM is a nogo
but beside that with 16 GB, you are ok even with 50 TB of data.

More RAM gives you a readcache that accelerates disk access.

http://www.oracle.com/technetwork/s...ocumentation/solaris11-2-sys-reqs-2191085.pdf

Deleted member 82943 · Dec 10, 2015

What is magical about ZFS on solaris where ZFS often needs much more ram for dedup on other systems like FBSD and Linux ZFS?

_Gea · Dec 10, 2015

All OpenZFS share the same dedup bits.
There is no difference between BSD, Linux or Illumos (free Solaris fork) regarding RAM needs of dedup.

Oracle Solaris is a different ZFS development path.
Maybe they are the first to reduce RAM needs as they work on dedup, unclear how this will affect dedup rates.

_Gea · Dec 11, 2015

Just received an email that OmniOS bloody
with SMB 2 is available

This suits perfectly to some tests that I made mainly on OSX
with my MacPros and 10 GbE to get answers about questions like

Is shared ZFS storage fast enough for a video editing storaga and at what quality
SMB1 vs SMB2
mtu 1500 vs MTU 9000 (Jumboframes)
NVMe vs SSD vs Disks
Raid 0/Mirror vs RaidZ
Single user vs multiple user
Single pool vs multi pools

some fast tests for concept principles (SMB 2.1 is essential)
http://napp-it.org/doc/downloads/performance_smb2.pdf

davewolfs · Dec 12, 2015

Nice to see the features trickling down as Oracle really isn't an option for anyone at home.

Deleted member 82943 · Dec 12, 2015

_Gea said:
All OpenZFS share the same dedup bits.
There is no difference between BSD, Linux or Illumos (free Solaris fork) regarding RAM needs of dedup.

Oracle Solaris is a different ZFS development path.
Maybe they are the first to reduce RAM needs as they work on dedup, unclear how this will affect dedup rates.

Thanks for the clarification. It will be nice to see deduplication take less ram in the future.

ToddW2 · Dec 12, 2015

GEA -- Do you have any Intel SSD vs. Intel SSD w/ DEDUP performance benchmarks?

_Gea · Dec 13, 2015

There are only very few use cases where current online dedup makes sense.
In my own setups I avoid dedup and stay with LZ4 compress and add some disks when needed.

If you intend to use dedup, you should care that dedup tables can be hold
in RAM in any case what means that you should not use in large pools but
in dedicated smaller pools ex SSD pools for such a use case.

unclerunkle · Dec 13, 2015

Dumb question

Are there any recommended steps for setting folder acl at the share level? I'm using the free version of napp-it and keep getting access denied in Windows mmc even after resetting folder acl through napp-it.

ToddW2 · Dec 13, 2015

_Gea said:
There are only very few use cases where current online dedup makes sense.

Care to share those cases?

_Gea said:
In my own setups I avoid dedup and stay with LZ4 compress and add some disks when needed.

If you intend to use dedup, you should care that dedup tables can be hold
in RAM in any case what means that you should not use in large pools but
in dedicated smaller pools ex SSD pools for such a use case.

RAM isn't a concern, and as I mentioned this would be for SSD/flash storage.

If things go well this afternoon I MIGHT get time to test this, but likely a week out. Was curious if you'd done any transfers w/dedup and compared.

chune · Dec 13, 2015

so it looks like hipster branch of OI finally fixes the net-snmp issue that causes >2tb zfs filesystems to get reported as 0 bytes!

Gea, i know you do not support napp-it on OI-hipster but was wondering if i can remove/replace the incompatible packages on my hipster install to convert it to something napp-it supports? Is the reason why you cant run nappit on hipster is that it purges all of the opensolaris packages? If i re-enabled the opensolaris.org publisher would nappit work? Or is it more complicated than that?

Thanks!

_Gea · Dec 13, 2015

unclerunkle said:
Dumb question

Are there any recommended steps for setting folder acl at the share level? I'm using the free version of napp-it and keep getting access denied in Windows mmc even after resetting folder acl through napp-it.

ACL on files and folders and ACL on shares are two completely different things on Solarish. The latter gives an option to additionally reduce permissions without touching files.

You can set ACL on shares via CLI, napp-it ACL extension and from Windows computer management after SMB connecting as a user that is a member of SMB group admins.

_Gea · Dec 13, 2015

ToddW2 said:
RAM isn't a concern, and as I mentioned this would be for SSD/flash storage.

I misunderstood your question as I thought you want to use the SSD for dedup tables.
But the remaining problem is. If RAM is not endless, you can use it for dedup tables or ARC readcache. Dedup is intended to save capacity, ARC is for performance.

If you have endless RAM, you can have both. In realworld only one or the other.

btw
Use case for dedup is when you can achive high dedup rates, say >5 otherwise SSDs are too cheap nowadays.

_Gea · Dec 13, 2015

chune said:
so it looks like hipster branch of OI finally fixes the net-snmp issue that causes >2tb zfs filesystems to get reported as 0 bytes!

Gea, i know you do not support napp-it on OI-hipster but was wondering if i can remove/replace the incompatible packages on my hipster install to convert it to something napp-it supports? Is the reason why you cant run nappit on hipster is that it purges all of the opensolaris packages? If i re-enabled the opensolaris.org publisher would nappit work? Or is it more complicated than that?

Thanks!

The Napp-it installer comes with nearly everything that is needed so Hipster should not be a huge problem. As there is now an iso for Hipster, I will check if it is running when I find some time to check.

For production use, I would use OmniOS.

natkin · Dec 13, 2015

Hello,

I started seeing again the "snap-age d1error days" errors in snap logs. (I wonder if it broke after the latest OmniOS r151016 update, omnios-073d8c0). I've traced the problem to job-snap.pl not finding zfs in the PATH when it runs the following:

Code:

my $days=`zfs list -H -o creation $s`;  # ex Sun Sep 15  0:05 2013

Most other parts of napp-it either use the &exe subroutine or include the full path (e.g., $ZPOOL = "/usr/sbin/zpool"), but I found a few lines (including the one above) which use backticks to call ZFS commands without a path:

Code:

$ find /var/web-gui/data -iname '*.p[lm]' ! -path '*/linux*' -print0 `
  | xargs -0 perl -nle '!/^\s*#/ && /`z/ && do { print $ARGV, ":", $_ };'
/var/web-gui/data/napp-it/zfsos/_lib/scripts/job-scrub.pl:   my $error=`zpool scrub -s \'$pool\'`;
/var/web-gui/data/napp-it/zfsos/_lib/scripts/agent-bootinit.pl:     $r=`zpool list -H -o name`;
/var/web-gui/data/napp-it/zfsos/_lib/scripts/agent-bootinit.pl:       $r=`zpool destroy -f $t`;
/var/web-gui/data/napp-it/zfsos/_lib/scripts/job-snap.pl:      my $days=`zfs list -H -o creation $s`;  # ex Sun Sep 15  0:05 2013

jad0083 · Dec 14, 2015

_Gea,

Does the SMB2 upgrade on bloody affect performance with windows by any chance?

_Gea · Dec 14, 2015

natkin said:
Most other parts of napp-it either use the &exe subroutine or include the full path (e.g., $ZPOOL = "/usr/sbin/zpool"), but I found a few lines (including the one above) which use backticks to call ZFS commands without a path:

Thanks a lot,
fixed in 0.9f7+

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

[H]ard|Gawd

n00b

n00b

Supreme [H]ardness

Supreme [H]ardness

n00b

n00b

n00b

n00b

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

Limp Gawd

Gawd

Gawd

Supreme [H]ardness

Deleted member 82943

Guest

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

Deleted member 82943

Guest

2[H]4U

Supreme [H]ardness

Weaksauce

2[H]4U

Weaksauce

Supreme [H]ardness

Supreme [H]ardness

Supreme [H]ardness

n00b

[H]ard DCOTM March 2022

Supreme [H]ardness