OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Critical ZFS bug 6214

There is a critical bug in current OpenZFS in L2Arc cache modul
that can lead to data corruption.

affected distributiones: current OpenZFS

suggested action:
- remove L2Arc
- check pool (scrub, dbd)


http://echelog.com/logs/browse/illumos/
http://blog.gmane.org/gmane.os.omnios.general/day=20150912
https://www.illumos.org/issues/6214


Update
A fix is available for OmniOS 151014 at
http://omnios.omniti.com/wiki.php/ReleaseNotes/r151014
Please update (requires a reboot)

more: about how to handle the problem:
http://blog.gmane.org/gmane.os.omnios.general/day=20150914
(read comments, comment at OmniOS discuss if you have additional insights)
 
If I'm to use the iR2420-2S-S2 enclosure for Raid-1 of ESXi, how will I know if a drive failed?
Does anyone know?

Also, if I did find out a failed drive and replaced it, how will I know if it was properly rebuilt?

I'm trying to find a way to set up a Raid-1 datastore that ESXi can see natively.
 
If I'm to use the iR2420-2S-S2 enclosure for Raid-1 of ESXi, how will I know if a drive failed?
Does anyone know?

Also, if I did find out a failed drive and replaced it, how will I know if it was properly rebuilt?

I'm trying to find a way to set up a Raid-1 datastore that ESXi can see natively.

I use an older type of these raid-1 enclosures.
Initially you insert the master disk, optionally you need to press a jumper. Then insert the second slave disk.

The enclosures have two leds. If they blink together on access, everything is ok. If a disk fails the according led turns red and optionally the beeper beeps. On a rebuild the rebuilded disk blinks every second and when finished, they blink together on access.

Keep attention: If one disk has bad sectors, this cannot be detected by the enclosure (what half of the mirror is bad or ok, this is not highend raid with cache and BBU and far away from things like ZFS) what may lead to a corrupted filesystem or a crash. I have had this in the past. I usually use these enclosures to hot-create a mirror and to remove then the second disk and use it in case of problems as a bootable backup disk. On system changes I plug it in to update the disk and remove it afterwards.
 
Any ideas why napp-it/omni will allow connections to NFS from one interface but not from another?

I have two networks and two interfaces. One is DHCP and the other is Static. For some reason no matter what I try anything connecting to the DHCP interface is given a permission denied on client side. Any thoughts?

Edit:

It seems that if I just use on as the property or use the entire subnet things work. Is there a way I can track or trace through logs why the connection is being rejected?
 
I use an older type of these raid-1 enclosures.
Initially you insert the master disk, optionally you need to press a jumper. Then insert the second slave disk.

The enclosures have two leds. If they blink together on access, everything is ok. If a disk fails the according led turns red and optionally the beeper beeps. On a rebuild the rebuilded disk blinks every second and when finished, they blink together on access.

Keep attention: If one disk has bad sectors, this cannot be detected by the enclosure (what half of the mirror is bad or ok, this is not highend raid with cache and BBU and far away from things like ZFS) what may lead to a corrupted filesystem or a crash. I have had this in the past. I usually use these enclosures to hot-create a mirror and to remove then the second disk and use it in case of problems as a bootable backup disk. On system changes I plug it in to update the disk and remove it afterwards.

Interesting points.
If I plug in my main SSD that already has VMs into tray#1 and then plug in a slave into #2 and enable Raid-1, will it destroy SSD #1?
 
NFS and SMB is listening on all interfaces, If you did not made any firewall settings to block an interface, I would expect an ip or routing problem.

You must not but should use a different subnet for each interface. You must care of default route
and gateway settings. I would try a manual ip setting on both nics.
 
Interesting points.
If I plug in my main SSD that already has VMs into tray#1 and then plug in a slave into #2 and enable Raid-1, will it destroy SSD #1?

No, data on first disk is normally kept but you should read the manual how to set raid-1 not raid-0 and how to initialize the mirror (mirror second disk with first). Try it first with an uncritical master disk.
 
NFS and SMB is listening on all interfaces, If you did not made any firewall settings to block an interface, I would expect an ip or routing problem.

You must not but should use a different subnet for each interface. You must care of default route
and gateway settings. I would try a manual ip setting on both nics.

Each interface is on its own subnet. 10.255.0.0/24 and 10.250.0.0/16 the latter is configured strictly for storage. The machine that I am attempting to connect from is on 10.255.0.0/24 and doesn't have access to 10.250.0.0/16.

Default Gateway is 10.255.0.1

Routing Table: IPv4
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ---------- ---------
default 10.255.0.1 UG 1 0 vmxnet3s0
10.250.0.0 10.250.1.2 U 4 7060102 vmxnet3s1
10.255.0.0 10.255.0.174 U 8 440672 vmxnet3s0
127.0.0.1 127.0.0.1 UH 3 124 lo0

ADDROBJ TYPE STATE ADDR
lo0/v4 static ok 127.0.0.1/8
vmxnet3s0/dhcp dhcp ok 10.255.0.174/24
vmxnet3s1/v4 static ok 10.250.1.2/16
lo0/v6 static ok ::1/128

The only way I can get nfs to accept the connection is by using ro/[email protected]/24 if use a specific ip it does not work. If I use a specific ip for 10.250.x.x it will work.
 
This could be something related to how I was configuring the sharenfs property. I wasn't prefixing single IP's with @. Yet somehow they worked for the static ones. I suppose I should be prefixing all addresses (even if they are not CIDR subnets) with @ ?
 
Even when configuring single IP addresses I always configure using the CIDR. I've had weird behavior when not doing it this way.

ie.( ro/[email protected]/32 ) for a single IP address.
 
Speaking of NFS user permissions. What is a best practice if I am using specifc UID/GID's on one of my NFS shares for my client?

E.g. I have user sabnzbd and group media (media group so Plex can read files left by sabnzbd).

Is it fine to create sabnzbd and media on OmniOS with the same UID/GID as my Linux box or is there a better way of handling this?
 
There is no authorisation or authentication with NFS3.
Setting an ip or UID isn't a problem even with minimal IT knowledge.
Not knowing the ip or UID is the only defense. Using a small script to loop both is easy.

This leaves you with 4 options
- do not care
- use NFS3 only in secure environments where you can accept a fully open NFS
- use a firewall setting to restrict interface
- use something with authentication ex CIFS
 
about an NFS Problem with ESXi + OmniOS latest

I have just updated a storage VM OmniOS 151014 on one of my testmachines from April->July edition. After that, I saw the all path down and was not able to remount the NFS datastore.

A reboot of ESXi fixed at least my problem.
A switch or update of the storage VM with NFS enabled may require an ESXi reboot.

I've rebooted both of my ESX servers and this is still happening over NFS. iSCSI is working just fine.

I am thinking of re-installing with an older OmniOS version tomorrow to see if that clears it up. We've been running 3 OmniOS/Napp-It installations and we only have a problem with an ESX 5.1 server with a 2 second intermittent APD. The server we built today just doesn't seem to want to work.
 
What are the steps to update OmniOS + Nappit?

Currently running on ESXI. Should I be using the new vm image or are there some commands I can run to update?
 
I'm curious what are folks using for off site replication? This is a home setup so not really an option of replicating to an offsite zfs server (unless there is a solution that is reasonably priced).

There seems like so many.

Crashplan
Tarsnap
Rsync.net
Google Nearline Rsync
Cloudberry/Arq reading off of NFS/CIFS share

What works and is also reasonably priced?
 
I'm curious what are folks using for off site replication? This is a home setup so not really an option of replicating to an offsite zfs server (unless there is a solution that is reasonably priced).

There seems like so many.

Crashplan
Tarsnap
Rsync.net
Google Nearline Rsync
Cloudberry/Arq reading off of NFS/CIFS share

What works and is also reasonably priced?

Rsync.net supports "zfs send" now.

Nearline is really only for archival, not backup.
 
Rsync.net supports "zfs send" now.

Nearline is really only for archival, not backup.

How so, your data is there it's just downloadable at a throttled speed. I'm ok with that if the house burns down.

Rsync.net seems expensive but that is cool that they support zfs send. Would the napp-it pro features have anything to do with zfs send or is this free out of the box?
 
How so, your data is there it's just downloadable at a throttled speed. I'm ok with that if the house burns down.

Rsync.net seems expensive but that is cool that they support zfs send.

Yep, was just saying because this is a ZFS thread. I think Rsync.net has a promotional $.06/GB for ZFS send customers right now.

Nearline is def cheaper but you obviously have to pay retrieval costs to download your data if you ever need to recover it.
 
Yep, was just saying because this is a ZFS thread. I think Rsync.net has a promotional $.06/GB for ZFS send customers right now.

Nearline is def cheaper but you obviously have to pay retrieval costs to download your data if you ever need to recover it.

I believe they charge you 12 cents for data transfers. So basically $120/tb. If your NAS blows up once a year you are saving $470/yr with Google Nearline.

Assuming 1 tb.

Rsync = $60 * 12 = $720
Nearline = $10 * 12 = $120 + $130 if data must be pulled (12 cents egress, 1 cent data retrieval)

Not sure how well duplicity works with google or even their own rsync command.
 
EDIT: Is anybody using Solaris 11.3 beta and SMB 2.1?

After running many tests, when I enable SMB protocol v2.1 then my read speeds are incredibly slow on Unix client devices. However, if I switch to protocol v1.0 then it's much, much faster. Could this be a bug? I'd love to test Nexenta for SMB 2.1 support but I have too much storage for their free/community edition. :(
 
Last edited:
EDIT: Is anybody using Solaris 11.3 beta and SMB 2.1?

After running many tests, when I enable SMB protocol v2.1 then my read speeds are incredibly slow on Unix client devices. However, if I switch to protocol v1.0 then it's much, much faster. (
Rates ?
 
I'm sorry but disregard. I've re-run some tests and it seems correct now. However, SMB 2.1 is not that much faster than 1.0. I'll post some screenshots/benchmarks later.

I do have another question though...

What's the best way to define UID and GIDs to be shared with other systems?

For example, I'm sharing using both SMB and NFS. When I share using NFS, the users have both UIDs and GIDs that I am manually creating on each system to make sure they are in-sync, but should I be using User Private Groups (UPGs) like Debian users, or should I create a generic "staff" group like Solaris users?
 
SMB1 is at the end of its evolution. There are reports that it is even faster than SMB2/3 in some environments. There is also OSX with its weak SMB1 implementation where SMB2 is much faster. SMB2/3 has the power for a better performance especially with 10Gb+ networks in mind. So its the future.

Regarding NFS(3). There is no authentication or authorisation, all is based on good will, fakeable UID and ip adress. If you need access control, use SMB (any).
 
Thanks Gea.

I've tested OSX and SMB1 is definitely much slower (about 50% slower). However, on Debian-based systems SMB1 and SMB2 are very similar like you mention.

Regarding NFS3, I understand about the authentication and authorization, but I'm wondering about "best practices" when I'm creating UIDs and GIDs to make sure they are the same on the Solaris box and my Debian boxes (given the mixed environment including my Windows boxes).

For example, let's say I have a user called "sofakng". If I create the user on Solaris, it will have a unique UID (i.e. 1001) and be assigned a primary GID of "staff" (1000?). This means the user is "sofakng" and group is "staff".

Now I want to create that same user on a Debian server. By default, it uses User Private Groups so the UID can be the same (i.e. 1001) but a unique group will be created (i.e. GID 1001). This means the user is "sofakng" and group is "sofakng".

We now have a mismatch of GIDs between servers. Is this a big deal or should it be ignored? I'm trying to make my user/groups the same across my Solaris, Debian, and Windows machines.
 
We now have a mismatch of GIDs between servers. Is this a big deal or should it be ignored? I'm trying to make my user/groups the same across my Solaris, Debian, and Windows machines.

Forget that. NFS3 behaves different on any platform and is basically incompatible at least with Solaris CIFS as this is based not on UID/GID but on Windows SIDs.

You can only try to keep users/UIDs consistent between Linux/Solaris
Or use Microsoft Active Directory with Unix Extensions to try to keep UID and SID in sync.
 
Last edited:
Could I use NFSv4 instead? Would that help with the UID/GID synchronize issue?

I'm probably going to avoid Active Directory/LDAP since it's a bit of overkill.
 
Linux vs Unix (BSD,Solaris) NFS is not the problem, its the Windows world.
If you want to be compatible with all clients, go SMB only as this is available and works the same on all platforms.
 
Hey Gea I've seen you say that smb on Solaris tops out around 300MB/s without jumbo frames. Is that per client or total connection to all clients?
 
Could I use NFSv4 instead? Would that help with the UID/GID synchronize issue?

I'm probably going to avoid Active Directory/LDAP since it's a bit of overkill.

If you need the same UID and GID for windows you can set it in the registry. I am doing this for one of my shares and it works just fine. Unfortunate limitation is that this is only for 1 user.

https://support.dbssolutions.com/su...-for-nfs-and-user-name-mapping-without-ad-sua

As for OmniOS/Linux. I just added the user and group into OminOS and used the same UID and GID as my Linux Machine. Seems to work. NFS4 will only benefit you by having the same names with different UID/GID. If you have a limited number of users then you might not even need this.

If Gea is referring to using NFS + SMB on same ZFS share I've tried this and things did get wonky with respect to users so I avoided it. Silly that one needs Windows Enterprise in order to get NFS! If you have lots of users to manage, consider LDAP or AD. For me, the few manual ones I needed to create are easy enough.
 
Last edited:
I'm sorry but disregard. I've re-run some tests and it seems correct now. However, SMB 2.1 is not that much faster than 1.0. I'll post some screenshots/benchmarks later.

I do have another question though...

What's the best way to define UID and GIDs to be shared with other systems?

For example, I'm sharing using both SMB and NFS. When I share using NFS, the users have both UIDs and GIDs that I am manually creating on each system to make sure they are in-sync, but should I be using User Private Groups (UPGs) like Debian users, or should I create a generic "staff" group like Solaris users?

I'm curious, why Solaris? Why not OmniOS?
 
Back
Top