OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

How do you get OmniOS to shutdown (from vmware tools)?
Mine always ends up at the grub screen and then counts down and reboots.
edit
I kind of suspect my kernel panics* are esxi5.1 vmware tools related... How is OmniOS on 5.5?
I'm on esxi5.1 Update 1. I can't see anything in the two patches newer than that about solaris kernel panics... Anyone know if they might help?

*The random one's I'm not sure, but the shutdown ones for sure.
 
Last edited:
How do you get OmniOS to shutdown (from vmware tools)?
Mine always ends up at the grub screen and then counts down and reboots.
edit
I kind of suspect my kernel panics* are esxi5.1 vmware tools related... How is OmniOS on 5.5?
I'm on esxi5.1 Update 1. I can't see anything in the two patches newer than that about solaris kernel panics... Anyone know if they might help?

*The random one's I'm not sure, but the shutdown ones for sure.

I have the same problem on ESXi 5.5, so unfortunately, I don't think an upgrade will solve it.
 
Are you guys using Gea's image?

I just installed OmniOS latest bloody (as of yesterday) from scratch (a bit painful because you need to start with a E1000 NIC install a pkg then move on to VMXNET3), but I am not having any issue with OmniOS not powering off.

This is with ESXi 5.5 and latest VMtool installed.
 
So i've been tinkering with my ZFS pool quite a bit yesterday. (Moved from a Mirror of 2x 4TB to a RaidZ of 3x 4TB), I decided to move to OmniOS as well since my SSD for my AIO is small.

One thing that always bothered me is that after spending thousand of $ on a home build. I was never getting close to 1Gbps.

Few things I found out:
Use a Good NIC on both side with latest drivers.
If you use a VMware Distributed vSwitch, for me disabling Network I/O Control improved performance.

Finally I was left with SMB fast write ( constant 111MB/s ) but disapointing read (starting to 110MB/s dropping to 70-80MB/s and staying there)

I spend hours tweaking the ESX buffer size, cpu reservation, ram, vcpu etc.. nothing helped much.

In the end the tweak that made the most difference was to add to the end of /etc/system this line:

set zfs:zfs_vdev_max_pending=2


I am wondering if anyone else see an improvement on large (10GB) read and write. Or if it's just luck.

I am not sure why I am getting ~20 MB/s better performance with this value.
 
Hi,

I'm trying to configure napp-it mails, but so far the SMTP-test shows for mailfish

"oops...could not connect smtp-server mx.mailfish.de"

or, when using strato instead of mailfish,

"message was sent"

but test mails don't arrive. It even shows "message was sent" when I enter a wrong pw :confused: Strato doesn't require TLS as I was able to send a mail in Outlook over Port 25 without TLS. As you can see I also tried using a forwarder, mailfish.

We're having two servers, both on OmniOS 151006p and napp-it 0.9b3 nightly (Sep.14.2013). Once, but only once I received the test mail via mailfish and gmail, it had "***SPAM*** ein Test" as subject, so there might be a Spamfilter, but I looked into all spamfolders and found nothing. Also, this doesn't explain to me the "message was sent" issue with Strato.

Any help would be appreciated,
thanks in advance!

kniven
 
Are you guys using Gea's image?

I just installed OmniOS latest bloody (as of yesterday) from scratch (a bit painful because you need to start with a E1000 NIC install a pkg then move on to VMXNET3), but I am not having any issue with OmniOS not powering off.

This is with ESXi 5.5 and latest VMtool installed.
No, I installed Stable from scratch.
I might try a bloody VM and see what happens...
At least the random panics stopped after I changed out the RAM.
 
No, I installed Stable from scratch.
I might try a bloody VM and see what happens...
At least the random panics stopped after I changed out the RAM.

Might be worth doing a full RAM check with memtest86+ (free).
 
Gea, or anyone.

I imported my pool from OI to OmniOS, and now I joined a domain successfully.

Then I followed the instruction and created the following two mapping:

idmap add 'wingroup:[email protected]' 'unixuser:root'
idmap add 'wingroup:[email protected]' 'unixgroup:root'

If I understand correctly this map my domain admin to the unix root account.

The problem I am running into is that: In many of my share, I am unable to make ACL change from windows. It lack some permission on some files.
If I right click on the file in windows and go to security I see: The requested security information is either unavailable or can't be displayed.

Unix permission are 777, ACL are user:root full, everyone@ modify_set, I have tried everyone@full, dosen't help. I have tried to reset ACL from napp-it multiple times.

If I look at the file-Perm in napp-it on the file that it stop I get (Note that the one with user 106 are the ones failing) the other with just root and everyone work fine.

Code:
-rwxr-xr-x+  1 root     root       38997 May  3 06:38 /Tank/NFShare/PXEboot/xbmc12_2/casper/filesystem.manifest
     0:user:root:read_data/write_data/append_data/read_xattr/write_xattr/execute/delete_child/read_attributes/write_attributes/delete/read_acl/write_acl/write_owner/synchronize:inherited:allow
     1:everyone@:read_data/read_xattr/execute/read_attributes/read_acl/synchronize:inherited:allow
-rwxrwxrwx+  1 root     root          61 May  3 06:38 /Tank/NFShare/PXEboot/xbmc12_2/casper/filesystem.manifest-remove
     0:user:root:read_data/write_data/append_data/read_xattr/write_xattr/execute/delete_child/read_attributes/write_attributes/delete/read_acl/write_acl/write_owner/synchronize:inherited:allow
     1:user:106:read_data/write_data/append_data/read_xattr/write_xattr/execute/delete_child/read_attributes/write_attributes/delete/read_acl/write_acl/write_owner/synchronize:inherited:allow
     2:owner@:read_data/write_data/append_data/read_xattr/write_xattr/execute/read_attributes/write_attributes/read_acl/write_acl/write_owner/synchronize:allow
     3:group@:read_data/write_data/append_data/read_xattr/execute/read_attributes/read_acl/synchronize:allow
     4:everyone@:read_data/write_data/append_data/read_xattr/execute/read_attributes/read_acl/synchronize:allow
-rwxrwxrwx+  1 root     root          11 May  3 06:38 /Tank/NFShare/PXEboot/xbmc12_2/casper/filesystem.size
     0:user:root:read_data/write_data/append_data/read_xattr/write_xattr/execute/delete_child/read_attributes/write_attributes/delete/read_acl/write_acl/write_owner/synchronize:inherited:allow
     1:user:106:read_data/write_data/append_data/read_xattr/write_xattr/execute/delete_child/read_attributes/write_attributes/delete/read_acl/write_acl/write_owner/synchronize:inherited:allow
     2:owner@:read_data/write_data/append_data/read_xattr/write_xattr/execute/read_attributes/write_attributes/read_acl/write_acl/write_owner/synchronize:allow
     3:group@:read_data/write_data/append_data/read_xattr/execute/read_attributes/read_acl/synchronize:allow
     4:everyone@:read_data/write_data/append_data/read_xattr/execute/read_attributes/read_acl/synchronize:allow

I must be missing a little thing....

I just tried to change the ACL at the command line with
/usr/bin/chmod -R A=user:root:full_set:fd:allow /Tank/Mediacenter/Playlists/
/usr/bin/chmod -R A+everyone@:full_set:fd:allow /Tank/Mediacenter/Playlists/

Now I am getting the following error message on windows: "No mapping between account names and security IDs was done."...

Edit:

So for anyone stuck in the same mess as I did, here is what I did to restart clean

If you get "No mapping between account names and security IDs was done." it's likely because there is a userid on solaris that windows can't resolve, and that solaris itself dosen't know about (re-imported the pool?)
reset all file permission and owner:
chmod -R 777 {YourFilesystem}
chown -R root:root {YourFilesystem}

For resolving the "The requested security information is either unavailable or can't be displayed"
You need to reset all ACL, you can do:
/usr/bin/chmod -R A=user:root:full_set:fd:allow {YourFilesystem}
/usr/bin/chmod -R A+everyone@:full_set:fd:allow {YourFilesystem}

Finally make sure that the SMB ACL allow both @everyone and user:root (I don't know how to do that in command line, I did that in napp-it).


Hope it help someone else.
 
Last edited:
i didn't do the idmap thing and it works for me and my domain. your other settings look the same as mine. except my domain is 2008 r2
 
Hi All
I am setting up a small business environment and would like to use an AIO server. I expect the following to want to access the solaris file system:

-VMs running on ESXi using NFS
-OSX laptops
-Windows laptops

I need to ensure there are correct ACLs in place that allow individual users to have effectively home directories, a shared space for everyone, and individual home folders but for VMs. What I don't understand is how to map individuals from laptops to users in Solaris (particularly via NFS, and ideally with RSA or similar keys), nor how to map from non-interactive VM users to users in Solaris. If anyone has a pointer or suggestions on the easiest way to manage these situations I would greatly appreciate it.
 
In the end the tweak that made the most difference was to add to the end of /etc/system this line:

set zfs:zfs_vdev_max_pending=2

Back when we allowed customers to use SATA drives, it was a default modification to set this to 1 on such systems. 8 to 10 or so seems reasonable on SAS (& NL-SAS) drives, SSD's vary but usually much higher than 8-10, and SATA much of the time ends up benefiting more with this set down to 1-2. Something to do with how it (badly) handles command queuing.

Tweaking this setting also has an impact on latency, thus it may be that you want it higher if you give not one damn about latency and only about throughput, or the opposite if vice-versa. Setting this to 1 on 15K SAS disks gives you the best possible average latency per IOP, it also ridiculously lowers the total throughput you can get.
 
I am running into one more issue with the domain.

My idmap are as follow (replaced the mydomain.com)
Code:
idmap list
add     winuser:[email protected]        unixuser:root
add     wingroup:[email protected]      unixgroup:root
add     "wingroup:Domain [email protected]"      unixgroup:users

Currently the ACL for the share is @everyone full and SMB-ACL is also @everyone full

From the DC I am able to access file shares, when logged on as domain admin.

However from a workstation (not part of the domain) when I try to access the share with domain credential such as: MYDOMAIN\myuser I keep getting: Windows cannot access: sharename (Network name cannot be found).

I had the same issue originally on the DC before adding the idmap...

Additionally, from the DC I added MYDOMAIN\myuser to the share permission, it got propagated to OmniOS as I now see everyone and myuser@mydomain , but that dosen't help :(

Any advise? do I really need a 1:1 map between each domainuser and unix user?

Edit: Well... for some reason /Tank was root only. so only root could access any files...chmod 777 /Tank fixed it.
 
Last edited:
For AD/SMB use you do not need any mappings.

I do always a mapping of domain-admins to root so I
can manage shares as a regular domain admin.

You only need mappings, if you like to connect as a Windows AD user
and want to act on the Unix server like a known local user - for example
for compatibility reasons with other Unix services.

ps
Avoid chmod to Unix permissions (chmod 777 /folder) as this deletes
ACL inheritance settings and can keep your shares unusable.

Set only ACLs when using Solaris CIFS server on SMB shares

Check also if your ZFS property aclinheritance is set to pass-through.
On problems, you can reset all acl recursively to a default
root=full and everyone@=modify with the acl extension (this is free)
 
For everyone that is running NAPP-IT with an m1015, how do you identify which drive is bad when one fails? Is there any way to do this visually short of printing serial number labels on the edge of every drive?
 
For everyone that is running NAPP-IT with an m1015, how do you identify which drive is bad when one fails? Is there any way to do this visually short of printing serial number labels on the edge of every drive?

Under the disks menu there is a "identify with dd" option which basically starts a read from the raw disk to /dev/null. You identify by the activity light.

That method doesn't work if the drive is all-the-way-dead. When I had a drive fail I figured it out by starting a "zfs scrub" on the pools and seeing which drive's activity light stayed dark. Crude - but effective.
 
For everyone that is running NAPP-IT with an m1015, how do you identify which drive is bad when one fails? Is there any way to do this visually short of printing serial number labels on the edge of every drive?

Beside the positive/negative activity method that does not work with all disks
(there are a lot of SSDs without activity led), you can use menu disks-sas2 extension

This menu is part of the monitor extension but free for less than 8 disks
It gives you a list of your WWNs, Serials and the enclosure numbers of the disk
with other infos like vdev, product, capacity and errors.

You can also print it out to have a list when needed that includes all disks dead or alive.
 
So I switched over to Gea's latest OmniOS VM. I can shutdown/reboot without a kernel panic.
BUT I'm still getting random panics. I have a thread going on the Illumos list trying to figure out what's going on.
 
So I switched over to Gea's latest OmniOS VM. I can shutdown/reboot without a kernel panic.
BUT I'm still getting random panics. I have a thread going on the Illumos list trying to figure out what's going on.

I assume you have tested your ram for a while?
 
If I mount a zfs share over nfs the rsync/duplicity files from the nfs mount to some rsync server, what happens to the ACLs?
 
If I mount a zfs share over nfs the rsync/duplicity files from the nfs mount to some rsync server, what happens to the ACLs?

They are lost. Rsync (up to 3.1) does not support Solaris nfs4 ACLs.

If you need ACL support:
- Replicate with zfs send
- Sync with robocopy from Windows over SMB shares
 
I'm syncing to a service that only allows ssh/rsync
They use zfs, but zfs recv on their end isn't ready yet
I can't win this month

Can you figure out how to make duplicity work on OmniOS?
 
Can someone please help me shed some light on this behaviour?

Screenshot_from_2013_10_27_15_20_32.png


All of the above datasets, but the 2nd and 4th, have the ACL @everyone=modify_set. The 2nd and 4th have the ACL @group=modify set. SMB sharing is off, but NFS sharing is on.

Why does the FOLDER-ACL column show different information for the datasets, when their ACL is the same?

Why does the SHARE-ACL column show SMB sharing ACL, when SMB is off? I have tried removing the ACL on the two datasets in question, but it fails with an error.
 
I have a question for _Gea, I am a paid up Napp-it user (have several licenses for my company for various projects) and I have found that when replicating between sites the replication finishes but I always get this error message:

error time: 2013.10.27.18.30.07 info: incremental remote replication finished (time: 49349 s) job-replicate 518: new destination snap
dpool/xenservices01@1378877180_repli_zfs_sgbackup01_nr_2 was not created, check for space, timeouts and hidden snaps
end rcv time: 2013.10.27.18.30.03 551-49349 s:
incremental send time: 2013.10.27.04.47.36 SGFILER01: zfs send -I storage/xenservices01@1378877180_repli_zfs_sgbackup01_nr_1 storage/xenservices01@1378877180_repli_zfs_sgbackup01_nr_2 | /var/web-gui/data/tools/nc/nc -b 262144 -w 20 202.58.11.164 57710
start rcv time: 2013.10.27.04.47.34 /var/web-gui/data/tools/nc/nc -b 262144 -d -l -p 57710 | /usr/sbin/zfs receive -F dpool/xenservices01 2>&1

The snapshot number never increments when I go to look at snaps as mentioned. Other replication jobs from storage appliances on the local network DO increment the snapshot correctly.

I have plenty of space, permissions etc, there are no hidden snaps or clones, I've checked. I have deleted the job and recreated it again and still see the same issue. Any pointers?
 
Can someone please help me shed some light on this behaviour?

Screenshot_from_2013_10_27_15_20_32.png


All of the above datasets, but the 2nd and 4th, have the ACL @everyone=modify_set. The 2nd and 4th have the ACL @group=modify set. SMB sharing is off, but NFS sharing is on.

Why does the FOLDER-ACL column show different information for the datasets, when their ACL is the same?

Why does the SHARE-ACL column show SMB sharing ACL, when SMB is off? I have tried removing the ACL on the two datasets in question, but it fails with an error.

napp-it buffers ZFS informations for performance reasons.
It may happen that not all operations force a reload.

Try menu ZFS filesystems >> delete ZFS buffer
 
I have a question for _Gea, I am a paid up Napp-it user (have several licenses for my company for various projects) and I have found that when replicating between sites the replication finishes but I always get this error message:

error time: 2013.10.27.18.30.07 info: incremental remote replication finished (time: 49349 s) job-replicate 518: new destination snap
dpool/xenservices01@1378877180_repli_zfs_sgbackup01_nr_2 was not created, check for space, timeouts and hidden snaps
end rcv time: 2013.10.27.18.30.03 551-49349 s:
incremental send time: 2013.10.27.04.47.36 SGFILER01: zfs send -I storage/xenservices01@1378877180_repli_zfs_sgbackup01_nr_1 storage/xenservices01@1378877180_repli_zfs_sgbackup01_nr_2 | /var/web-gui/data/tools/nc/nc -b 262144 -w 20 202.58.11.164 57710
start rcv time: 2013.10.27.04.47.34 /var/web-gui/data/tools/nc/nc -b 262144 -d -l -p 57710 | /usr/sbin/zfs receive -F dpool/xenservices01 2>&1

The snapshot number never increments when I go to look at snaps as mentioned. Other replication jobs from storage appliances on the local network DO increment the snapshot correctly.

I have plenty of space, permissions etc, there are no hidden snaps or clones, I've checked. I have deleted the job and recreated it again and still see the same issue. Any pointers?

You seem to have a snap_nr_1 on target side and snap_nr_1 and snap_nr_2 on source side.
This indicates that there is no network or communication problem because the initial transfer happens.

The receiver (zfs receive) and sender (zfs send) get started. The result of this zfs send would be a snap_nr_2 on target side on success,
If the snap is missing, the zfs send->receive was interrupted or hangs. There is no log for the datastream itself. napp-it just report a failure.
On transfer or other snap problems zfs receive stops with an error message.

What you can do
open a putty session to sender and target as root, start the receiver and the the sender and see whats happening (commands see from your log)
Maybee you get an error message that helps (newest napp-it logs zfs receive errors as well)

then
try a reboot of both machines
 
Hi Gea (or others as well),

Can an OpenIndiana/Napp It server be controlled by an Active Directory or Open Directory server?
 
Hi Gea (or others as well),

Can an OpenIndiana/Napp It server be controlled by an Active Directory or Open Directory server?

You can join an AD domain and use domain user for SMB shares
(napp-it menu services - SMB - Active Directory)
 
Is there anything I should do for new disks I am adding to an array? IE when you receive new disks do you run a full surface scan or anything on them to ensure they work before putting them into production? And if so, how?
 
Is there anything I should do for new disks I am adding to an array? IE when you receive new disks do you run a full surface scan or anything on them to ensure they work before putting them into production? And if so, how?

You can use the built in format commands if you want to stress them.

Format -> analyse -> read/write/purge

purge is like a secure delete function

You can also compile secure delete and that program works very well for stressing disks.

Or simply do a dd if=\dev\zero to the drive.

Any way to slice it I like to stress test for at least 72 hours, but usually I will let it run for about a week, then I check smart data to see if there are any read or mechanical errors that have popped up.
 
Thanks good tips.

Another question, has anyone tried installing ESXi to an m1015 in IR mode driving a pair of drives in raid-1? I bought a (http://www.amazon.com/gp/product/B001VEI0NS/ref=oh_details_o03_s00_i00?ie=UTF8&psc=1) and the fan in it is obnoxiously loud, and the unit is pretty slow (maxed out at <100MB/s writes to FAST ssds). If ESXi supports it I think it would make more sense to just dedicate an m1015 to that, since they are similar price wise anyway.
 

As an Amazon Associate, HardForum may earn from qualifying purchases.
napp-it buffers ZFS informations for performance reasons.
It may happen that not all operations force a reload.

Try menu ZFS filesystems >> delete ZFS buffer

There is no such menu option. Do you mean the ZFS fielsystems >> reload menu option?

I tried this option, but it doesn't change anything.
 
There is no such menu option. Do you mean the ZFS fielsystems >> reload menu option?

I tried this option, but it doesn't change anything.

Can you check:
- napp-it shows wrong ACL info
(you may update to a newer release, newest is 0.9e preview, you can evaluata this with a pro key from
http://forums.servethehome.com/solaris-nexenta-openindiana-napp/2652-napp-0-9e-preview.html )
and check if the problem persists

- acl is different
(you can use acl extension to display whole ACL info)
 
Hi I have lots of questions relating to encryption...

1) Has anyone run any benchmarks of the file/lofiadm encryption solution Gea has integrated in to Napp It?

2) Related to the above, it would seem that this solution will require 2x the amount of caching/writes required since it has to transition ZFS twice, correct?

3) Is there any support at all for an l2arc/zil with this solution? It would seem like you would need an l2arc and zil device for each ZFS filesystem, the underlying and the encrypted overlay, correct?

4) Are the encrypted files that make up the encrypted pool hard sized at the start to consume the entire space on the drives they are hosted on? Or do they dynamically grow/shrink? (Wondering about from a backup standpoint).

Thanks!
 
Hi I have lots of questions relating to encryption...

1) Has anyone run any benchmarks of the file/lofiadm encryption solution Gea has integrated in to Napp It?

2) Related to the above, it would seem that this solution will require 2x the amount of caching/writes required since it has to transition ZFS twice, correct?

3) Is there any support at all for an l2arc/zil with this solution? It would seem like you would need an l2arc and zil device for each ZFS filesystem, the underlying and the encrypted overlay, correct?

4) Are the encrypted files that make up the encrypted pool hard sized at the start to consume the entire space on the drives they are hosted on? Or do they dynamically grow/shrink? (Wondering about from a backup standpoint).

Thanks!

1.
I have compared it to Solaris 11 some time ago. As I remenber it was about 20% slower than (the quite slow) Oracle Solaris 11 encryption with values up to about 50 MB/s.

2.
yes, it goes through ZFS twice.

3.
You do not need to cache twice. ARC/ L2ARC on the base pool is enough.
A ZIL is not needed/used at all with regular file services (beside ESXi over NFS where sync is default)

4.
Size of file-devices is fixed. If you create large devices, you usually do not backup the encrypted devices but the unencrypted files from the poe pool like you would do on BSD/Linux/Solaris 11.

result
This solution is not a high performance solution but it is ok even with larger POE pools.

But there is one aspect where it is unbeatable.
If you have critical data that you want to backup to insecure places (Amazon, Dropbox) where the admin or the NSA or whoever can read the data or to insecure medias (ex USB sticks, external disks with compared to ZFS insecure filesystems) that can be stolen or where data corruption can occur, you do not want to backup them unencrypted or without checksum protection or without Raid-Z(1-3) data security to fix file errors.

In such a cased, you can create file-devices smaller 2 GB and backup them to any target even USB sticks with FAT without any of these problems. This cannot be done in a similar way with any other method.
 
Back
Top