OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

It seems that the mediatomb add-on is currently broken. Any idea how I can get it running again, please?
 
I've been having nothing but trouble with AD integration on: OI 151a9, napp-it 0.9f1

When I connect to my SMB share from a windows box, I'm prompted for credentials. I'm already logged in with an AD account. OI can ping the DCs, and can resolve hostnames. if I use local credentials, my share works as intended, but that's not really AD integrated.

What am I doing wrong? Where are the ephemeral mappings?
 
I asume, that your Windows PC is also an AD member, then

- OI has not successfully joined to the domain
-> re-join the domain (napp-it menu services - smb - active directory)
check for "success" message, try again otherwise

- the domain master was temporarely not available
-> restart smb service

You must not care about ephemeral mappings. They are generated automatically without user action to map Windows security ids SID to Unix userids UID so Unix can work as usual regarding permissions.

The Solaris CIFS server itself does not rely on UID. It stores Windows SID as extended ZFS attributes. This allows moving a pool for example to another AD member server or restore a backup and all permissions stay intact. This is unique outside Windows-Server and ntfs.

A manual AD mapping that I add normally is
Windows AD admin => Unix root
 
Should be possible,see
http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2058287

But I would avoid such large vmdk files.
You can use iSCSI targets as an option.

Thanks, so I am trying to figure out my best configuration options - I'm assuming iSCSI for performance

I have:
4x 4TB 7200rpm Seagates
1x (soon 2x) Samsung 840 250gig SSD
Supermicro X10SL7-F w/ e3-1270
LSI 2308 set to IT mode
16gig ECC

My goal is to have 2-3 VMs (maybe more in the future)

1. My SageTV VM media server, need max storage
2. CentOS VM - I have directadmin and host webpages off of it, do not need much storage
3. Potentially a backup win2012 server, or use the SageTV with that. This might be a MSSQL test box for my wife (she is a dba)

Note: All of this can be blown away I have not utilized it yet

I went through the napp-it guide and installed napp-it on the SSD. I was planning to use z1 or z3 across all 4 drives NFS, then install VMs to that. I installed my SageTV VM with 90% of the storage via NFS. I then ran into the 2TB max partition and of course wanted much more.

GEA thanks for the feedback - I have looked into iSCSI (been many years since I have used it). It think that you can put iSCSI volumes on a ZFS NFS pool?

So for my setup what would be preferable or even possible. One of these probably won't work and I just don't know that.

#1. All VMs on the SSDs set to mirrored mode (once I get #2 SSD) - 250gig is plenty for the base OSs including base filesystems. ZFS z1 or z3, entirely used as a iSCSI volume. The SageTV VM which is the only one I need storage for will have Sage's directories on the iSCSI volume.

#2. Napp-it only on the SSD. 2 pools created. 1 for NFS/VMDK files ~ 1TB or so. Other pool used for iSCSI for the SageTV.
 
What I would do
- buy a cheap SSD (30GB) and use it to boot ESXi and as a local datastore for OmniOS/napp-it
- pass-through the LSI 2308 with the Samsung SSDs and the Seagates

- create two pools (SSD mirror and Z1 from the Seagates)
- share a filesystem on the SSD pool via NFS, use it as ESXi datastore for your VMs

- share a filesystem on the Z1 pool via SMB for general use and for VM backups
- create a zvol on the Z1 pool (a filesystem shared as blockdevice).
Use this as mass storage device via iSCSI

(other option may be: use a OmniOS SMB share if Sage can store files on a share.
This gives you a filebased snapshot capabiity. iSCSI gives only a diskbased snapshots option)
 
_Gea no I understand that ephemeral mappings happen automatically. I don't think they're happening for me.

I only have a few idmap entries in place, along the lines of what you're describing. But when I log in with a domain account using one of those, it doesn't work. Is it possible my idmaps are in the wrong format?

Mine look like:

idmap add 'winuser:<username>@<domain>' 'unixuser:<username>'
idmap add 'wingroup:<usergroup>@<domain>' 'unixgroup:<usergroup>'

but when I look in the ACLs, I'd imagine it might need to be:

idmap add 'winuser:<short_domain>\<username>' 'unixuser:<username>'

does that make sense?
 
Shortdomain\name and name@domain can be used both.
(shortdomain is oldstyle Windows NT compatibility name).
You can even use winuser:paul => unixuser:peter

The point: All these mappings are related to permissions not login problems
(unless the mapping is senseless like unixuser: paul=unixuser:peter).
To be sure, delete all mappings. You do not need any for a working CIFS server.

Remains the question: Have you seen a success message after joining the domain.
 
Last edited:
What I would do
- buy a cheap SSD (30GB) and use it to boot ESXi and as a local datastore for OmniOS/napp-it
- pass-through the LSI 2308 with the Samsung SSDs and the Seagates

- create two pools (SSD mirror and Z1 from the Seagates)
- share a filesystem on the SSD pool via NFS, use it as ESXi datastore for your VMs

- share a filesystem on the Z1 pool via SMB for general use and for VM backups
- create a zvol on the Z1 pool (a filesystem shared as blockdevice).
Use this as mass storage device via iSCSI

(other option may be: use a OmniOS SMB share if Sage can store files on a share.
This gives you a filebased snapshot capabiity. iSCSI gives only a diskbased snapshots option)

You rock! I'm not concerned about losing the SageTV data as it is just tv shows, etc - I back up my saved items elsewhere.

Last question -

What is the airspeed velocity of an unladen swallow?
 
I would try:

- delete all mappings
- use default share (everyone=full) settings
- connect as shortdomainname\user (or user@fulldomainname)

As I have never had serious problems with AD (use it daily) and if this does not work
- try a clean reinstall (prefer OmniOS)
 
Hej Gea...

I'm currently evaluating your monitor extansion and I think I need some explanation:)

I'm looking at realtime iostat read/write perf/s and rel_avr_dsk and worst_dsk are constantly 100%. I'm not sure which parameter they show, whether it's a write or writelast10s. They start kiddna ping at 0% and slowly change to red as they reach 100%, so I guess that's writelast10s.

Anyway, I'm wondering what rel_avr_dsk and worst_dsk means?

nappit-monitor.png


Looking at that picture, why are rel_avr_dsk and worst_dsk at 1000 req/s, but there is no read or write request... That status is observed most of the time. Sometimes, red line drop to 500 and then back to 1000. What does that mean?

Matej
 
While read, write, wait values are absolute values from iostat,
the relative/average vs worst value are only indicating relative values scaled to width of graph.

On balanced pools with good disks average vs worst should behave similar.
If they are different it indicates that you may have a problem with a single disk.
(Don't look at absolute numbers only check for difference).

If values are different, check iostat for disks not pool overview, thats the only reason for that graph.
(Maybe I should add an extra comment to the graph that the relative values are not io/s but only relative values)
 
Gea (and others), I currently run a setup that I would like to consolidate.

One box is just running OpenIndiana + napp-it:

-Core i3, 32 GB Ram (non-ECC - yeah, yeah I know)

OS is running on 2x 320GB drives in mirror

Z1 array for main data store/shares
6x 2TB WD drives (split in two Z1's of 3 drives ea.)
plus 1 Samsung 250GB SSD as cache

Dedupe on for certain zfs shares only.

2x 1TB WD reds running in mirror for ESXi VM's via iSCSI

Then I have a whole other box running just ESXi. (core-i7 w/ 24GB ram)

I feel I can save some complication/power by eliminating my ESXi box completely, but I don't want to compromise on power either. The Core-i3 is probably overkill for the ZFS server, but if I combine the two I might have an issue?

Currently the ESXi box runs mainly 3 VM's, 1 Windows Server 2010 for AD related stuffs, 1 Ubuntu box running 2 websites, Plex media server (incl. Transcoding when needed which will probably be the biggest CPU hog), SABnzbd + Sickbeard, etc, MySQL and lastly a VM that runs pfSense (to which I"m thinking of actually getting a dedicated appliance for in the future).

My main question I guess is this.

If I decide to go ahead with this, can I just move my zfs pools over to a new install of OpenIndina/napp-it? Is there anything I will have to do before (export zfs, etc?) or can I just shut down the server, wipe the OS and reinstall into a VM when I migrate ESXi over?

If I do go this route, upgrading to a more powerful CPU would probably be a good idea to handle the extra load of ESXi, for my VM's?

Also Gea, I see you finally offer an option for add-on's for home users! Fantastic!

Thanks!
 
If you want to build All-In-One (ESXi with virtualized storage), you can
import the current pool (with or without prior export) when you are using
- pass-through with an extra SAS Controller like an IBM 1015 in LSI 9211-IT mode or an LSI 9207 or
- use physical RDM to pass-through single disks (controller pass-through is preferred)

The i3 with 32 GB RAM may be ok, depends on your load but mostly VMs are RAM limited, not CPU limited. Use at least 8 GB for your storage VM.

You may also think about a move from OpenIndiana to OmniOS as it is currently better maintained.
I would avoid dedup in most cases. You must recreate the pool to get rid of it.
I would use a single Raid-Z2 with 6 disks over two Z1 with 3 disks each.
IO of second is better but in first case you can loose any two disks.

For the VM datastore, think of a second SSD only pool (example a Mirror from 2 x 256 GB SSDs)
 
If you want to build All-In-One (ESXi with virtualized storage), you can
import the current pool (with or without prior export) when you are using
- pass-through with an extra SAS Controller like an IBM 1015 in LSI 9211-IT mode or an LSI 9207 or
- use physical RDM to pass-through single disks (controller pass-through is preferred)

I do run an LSI in IT mode for some of the disks, and I assume that It wouldn't matter which ports the disks are connected to as usual in this case when moving as I would then do as you say an passthrough the LSI controller.

The i3 with 32 GB RAM may be ok, depends on your load but mostly VMs are RAM limited, not CPU limited. Use at least 8 GB for your storage VM.

You may also think about a move from OpenIndiana to OmniOS as it is currently better maintained.

Noted thanks! Moving the pools between them is a non-issue?

I would avoid dedup in most cases. You must recreate the pool to get rid of it.

Honestly I think I only have it on my "user" zfs shares, It doesn't bother me and hasn't caused me any issues. So I will probably leave it be.

I would use a single Raid-Z2 with 6 disks over two Z1 with 3 disks each.
IO of second is better but in first case you can loose any two disks.

Is there a way to mirgrate to Z2 without losing data? I have no issues with throughput as far as I can tell... I max out my gigabit LAN connections with reading and writing (100+MB/s) so not sure if it's worth it. I do plan on adding a 4 port NIC to the server to add link aggregation just to avoid any future bottlenecks. But still the client's would be only connected with single 100 or 1000 connections.

For the VM datastore, think of a second SSD only pool (example a Mirror from 2 x 256 GB SSDs)

Maybe a good idea, although again the two WD red's seem to do a fine job. None of my VM's are slow to load really and once they're up they don't do much. But noted as well thanks.
 
moving pools between OI and OmniOS: no problem
migrating Raid-Z1 vdevs to Z2 vdevs: not possible
 
I am currently running napp-it 0.9 f2 with Smartmontools 6.2
What is the command to update to Smartmontools 6.3?
 
You can compile 6.3 yourself or you can
rerun the napp-it wget installer as its installs 6.3
Code:
wget -O - www.napp-it.org/nappit | perl

You need to update/activate 0.9f2 again afterwards.
 
My NAS was working kiddna slow today, so I gave it a look. What I found was tons of errors being generated in my messages files. Errors were looking like this:
Code:
Oct 13 07:30:30 biglittle ahci: [ID 811322 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 7 reset device
Oct 13 07:30:35 biglittle ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 7 has task file error
Oct 13 07:30:35 biglittle ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 7 is trying to do error recovery
Oct 13 07:30:35 biglittle ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 7 task_file_status = 0x4041
Oct 13 07:30:35 biglittle ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 7 succeed
Oct 13 07:30:40 biglittle ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 7 has task file error
Oct 13 07:30:40 biglittle ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 7 is trying to do error recovery
Oct 13 07:30:40 biglittle ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 7 task_file_status = 0x4041
Oct 13 07:30:41 biglittle ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 7 succeed

I know scrub was running, so I checked it's status:
Code:
scan: scrub in progress since Mon Oct 13 03:00:01 2014
    112G scanned out of 5.34T at 6.85M/s, 222h18m to go
    18.5M repaired, 2.05% done
Yayks!

I went over to disk status in nappit and it reported 2562 hard erros...
Another yayks...

I head over to SmartCTL and checked it out:
Code:
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       8626
  2 Throughput_Performance  0x0026   053   053   000    Old_age   Always       -       19631
  3 Spin_Up_Time            0x0023   067   067   025    Pre-fail  Always       -       10100
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       39
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       29684
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       61
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       2227335
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       73
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   059   000    Old_age   Always       -       32 (Min/Max 15/42)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   099   099   000    Old_age   Always       -       186
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       21
200 Multi_Zone_Error_Rate   0x002a   001   001   000    Old_age   Always       -       33252
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       64  1

If I compare results from other drives in the same pool, this is where they differ:
- failing drive has around 8000 raw_read_erorr_rate compared to other, who has 4000
- failing drive has 21 UDMA_CRC_Error_Count, other drives 0
- failing drive has 180 Current_pending_sectors, other drives 0

What do you think the problem is?
Failing drive?
Faulty SATA cable?
Faulty SATA connector?
All drives are Samsung F4 EG with about 3,5 years of power-of hours...

In case I have to change hard drive, what do you guys recommed? Hitachi Deskstar 7K2000 would be a good choice?

Matej
 
I would reset the ACL of the filesystem using /usr/bin/chmod or napp-it ACL extension to a everyone@=modify, root=full or a basic root=full setting and then try again. Do not touch Unix permissions like 750 as they reset ACL inherheritance settings. Aclmode must be passthrough to modiy ACL in both cases.

The ACL everyone@ is not a user but refers to any known user - its a group that is similar to the Unix permissions for everyone.

How do you do reset the ACL?
chmod -A ?
 
Anyone, I am kinda worried about a disk in the Zpool, in nappit it shows under the error column in the disks menu. One disk shows this S:1 H:10 T:28, this was showing as S:1 H:10 T:24 yesterday.

Could this mean the disk is on it's way out?
 
Iostat messages are not real errors like checksum ZFS errors, more a sort of driver warnings. Your values are quite moderate. Especially on performance problems they indicate weak disks. On high values (several hundreds) I would remove the disks and do a low level test with a manufacturers tool or check power and cabling.
 
Hi there Gea,

First of all, thanks for all your hard work and the information that you put out there for your all-in-one, hardware, software configurations and examples!

On your all in one page, you recommend installing ESXi and the local OmniOS data store on a hardware RAID1 such as the Raidsonic SR2760-2S-S2B.

I was wondering, on that specific enclosure is it possible to take a single drive out and boot from one of the Raid1 drives directly if the hardware SR2760 itself fails? Or are you left with a disk that will only work in that particular raid unit.

Thanks!
 
I have not tried for a long time but I would expect that it boots a single disk from Sata

You can use the unit in two ways:
- Mirror to protect against a disk failure.
While this works very well, I have had problems when disks got bad blocks.
In such a case both disks can become corrupted.

- Backup (I mostly use it for this)
From time to time i hot-sync both disks. After this, I remove the second disk to a have a ready to use backup boot disk.

You can mix both variants with 3 disks.
 
Thanks for the info!

Just was wondering about the ability to boot from eg the mobo sata without raid in the event the hardware raid 1 unit failed, as I would hate to have two good disks and be left not able to boot either because the enclosure failed.

The backup method and swapping the regular 2nd disk for a third to get a backup, then swapping the 2nd sounds like a good idea.

Do you mostly use disks, or ssd for these? I suppose a backup with ssd normally connected and a backup of a similar sized hard disk, which is then removed might be ideal.

A second enclosure that you could clone the backed up raid1 disk to a non-raid disk might be a good idea, if it won't boot from outside the enclosure.
 
Wonder if anyone can assist with an issue I'm experiencing.

I have a HP N54L with 6x 3TB drives and 1 x usb storage drive as the boot.

There are two drives sitting on top of each other at the top of the case where the optical disk drive should go - one connected to on board sata, one connected to ESATA on the rear.

The problem it seems to have is during heavy tasks - the two connected to sata and esata appear to often hit 100% busy/wait whereas all the other disks will be significantly less.

I've tried this with different disks/different tasks/etc.. but I get the same issue.

My thought is that the ESATA and the SATA port somehow share the same path somehow resulting in half speeds on both drives?

Or that it's because I'm using a molex to 2x sata splitter to power both these drives.

Any thoughts on this?

edit - I'm using openindiana - raidz2 - but have used raidz1 and get the same issue with both configurations.
 
Last edited:
There isn't a whole lot of room up there or airflow, perhaps they are getting too hot?
 
Don't think heat is the problem - I've had this completely shut down and cold - then tried from cold boot and still had the same problem.

Could vibrations cause this (they are sitting on each other)?

As for internal space - its using 10% of it's space.

Also the case is open/drives aren't very hot to touch- the system is held in a cool environment.
 
I can get some rubber feet for both drives for a permanent solution - in the mean time I can try using foam dividers- and see if that does it.
 
What is the best way to reset my permissions on shares? For some reason the sub folders in one of my shares only have read access, and the reset acl's don't seem to function the way they used to in previous versions of Napp It.
In fact, NONE of the ACL stuff seems to work in the latest version.
 
Back
Top