OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

_Gea · Oct 2, 2014

You can use then physical raw disk mapping to pass-through single disks.
http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2046370

twry · Oct 3, 2014

It seems that the mediatomb add-on is currently broken. Any idea how I can get it running again, please?

_Gea · Oct 3, 2014

There is a problem with the add-on after installing applications from the Joyent pkgin repo.
You need a clean OmniOS and napp-it setup to run mediatomb.

If you need Apache or Owncloud, you must install Mediatomb first.
http://napp-it.org/extensions/mediatomb_en.html

Gigas-VII · Oct 3, 2014

I've been having nothing but trouble with AD integration on: OI 151a9, napp-it 0.9f1

When I connect to my SMB share from a windows box, I'm prompted for credentials. I'm already logged in with an AD account. OI can ping the DCs, and can resolve hostnames. if I use local credentials, my share works as intended, but that's not really AD integrated.

What am I doing wrong? Where are the ephemeral mappings?

_Gea · Oct 4, 2014

I asume, that your Windows PC is also an AD member, then

- OI has not successfully joined to the domain
-> re-join the domain (napp-it menu services - smb - active directory)
check for "success" message, try again otherwise

- the domain master was temporarely not available
-> restart smb service

You must not care about ephemeral mappings. They are generated automatically without user action to map Windows security ids SID to Unix userids UID so Unix can work as usual regarding permissions.

The Solaris CIFS server itself does not rely on UID. It stores Windows SID as extended ZFS attributes. This allows moving a pool for example to another AD member server or restore a backup and all permissions stay intact. This is unique outside Windows-Server and ntfs.

A manual AD mapping that I add normally is
Windows AD admin => Unix root

zervun · Oct 4, 2014

_Gea said:
Should be possible,see
http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2058287

But I would avoid such large vmdk files.
You can use iSCSI targets as an option.

Thanks, so I am trying to figure out my best configuration options - I'm assuming iSCSI for performance

I have:
4x 4TB 7200rpm Seagates
1x (soon 2x) Samsung 840 250gig SSD
Supermicro X10SL7-F w/ e3-1270
LSI 2308 set to IT mode
16gig ECC

My goal is to have 2-3 VMs (maybe more in the future)

1. My SageTV VM media server, need max storage
2. CentOS VM - I have directadmin and host webpages off of it, do not need much storage
3. Potentially a backup win2012 server, or use the SageTV with that. This might be a MSSQL test box for my wife (she is a dba)

Note: All of this can be blown away I have not utilized it yet

I went through the napp-it guide and installed napp-it on the SSD. I was planning to use z1 or z3 across all 4 drives NFS, then install VMs to that. I installed my SageTV VM with 90% of the storage via NFS. I then ran into the 2TB max partition and of course wanted much more.

GEA thanks for the feedback - I have looked into iSCSI (been many years since I have used it). It think that you can put iSCSI volumes on a ZFS NFS pool?

So for my setup what would be preferable or even possible. One of these probably won't work and I just don't know that.

#1. All VMs on the SSDs set to mirrored mode (once I get #2 SSD) - 250gig is plenty for the base OSs including base filesystems. ZFS z1 or z3, entirely used as a iSCSI volume. The SageTV VM which is the only one I need storage for will have Sage's directories on the iSCSI volume.

#2. Napp-it only on the SSD. 2 pools created. 1 for NFS/VMDK files ~ 1TB or so. Other pool used for iSCSI for the SageTV.

_Gea · Oct 4, 2014

What I would do
- buy a cheap SSD (30GB) and use it to boot ESXi and as a local datastore for OmniOS/napp-it
- pass-through the LSI 2308 with the Samsung SSDs and the Seagates

- create two pools (SSD mirror and Z1 from the Seagates)
- share a filesystem on the SSD pool via NFS, use it as ESXi datastore for your VMs

- share a filesystem on the Z1 pool via SMB for general use and for VM backups
- create a zvol on the Z1 pool (a filesystem shared as blockdevice).
Use this as mass storage device via iSCSI

(other option may be: use a OmniOS SMB share if Sage can store files on a share.
This gives you a filebased snapshot capabiity. iSCSI gives only a diskbased snapshots option)

Gigas-VII · Oct 4, 2014

_Gea no I understand that ephemeral mappings happen automatically. I don't think they're happening for me.

I only have a few idmap entries in place, along the lines of what you're describing. But when I log in with a domain account using one of those, it doesn't work. Is it possible my idmaps are in the wrong format?

Mine look like:

idmap add 'winuser:<username>@<domain>' 'unixuser:<username>'
idmap add 'wingroup:<usergroup>@<domain>' 'unixgroup:<usergroup>'

but when I look in the ACLs, I'd imagine it might need to be:

idmap add 'winuser:<short_domain>\<username>' 'unixuser:<username>'

does that make sense?

_Gea · Oct 4, 2014

Shortdomain\name and name@domain can be used both.
(shortdomain is oldstyle Windows NT compatibility name).
You can even use winuser

aul => unixuser

eter

The point: All these mappings are related to permissions not login problems
(unless the mapping is senseless like unixuser: paul=unixuser

eter).
To be sure, delete all mappings. You do not need any for a working CIFS server.

Remains the question: Have you seen a success message after joining the domain.

zervun · Oct 4, 2014

_Gea said:
What I would do
- buy a cheap SSD (30GB) and use it to boot ESXi and as a local datastore for OmniOS/napp-it
- pass-through the LSI 2308 with the Samsung SSDs and the Seagates

- create two pools (SSD mirror and Z1 from the Seagates)
- share a filesystem on the SSD pool via NFS, use it as ESXi datastore for your VMs

- share a filesystem on the Z1 pool via SMB for general use and for VM backups
- create a zvol on the Z1 pool (a filesystem shared as blockdevice).
Use this as mass storage device via iSCSI

(other option may be: use a OmniOS SMB share if Sage can store files on a share.
This gives you a filebased snapshot capabiity. iSCSI gives only a diskbased snapshots option)

You rock! I'm not concerned about losing the SageTV data as it is just tv shows, etc - I back up my saved items elsewhere.

Last question -

What is the airspeed velocity of an unladen swallow?

_Gea · Oct 5, 2014

zervun said:
You rock! I'm not concerned about losing the SageTV data as it is just tv shows, etc - I back up my saved items elsewhere.

Last question -

What is the airspeed velocity of an unladen swallow?

11 m/s
http://www.wolframalpha.com/input/?i=what+is+the+airspeed+velocity+of+an+unladen+swallow

Gigas-VII · Oct 5, 2014

_Gea said:
Remains the question: Have you seen a success message after joining the domain.

Yes, Domain join successful, always has been.

_Gea · Oct 5, 2014

I would try:

- delete all mappings
- use default share (everyone=full) settings
- connect as shortdomainname\user (or user@fulldomainname)

As I have never had serious problems with AD (use it daily) and if this does not work
- try a clean reinstall (prefer OmniOS)

levak · Oct 7, 2014

Hej Gea...

I'm currently evaluating your monitor extansion and I think I need some explanation

I'm looking at realtime iostat read/write perf/s and rel_avr_dsk and worst_dsk are constantly 100%. I'm not sure which parameter they show, whether it's a write or writelast10s. They start kiddna ping at 0% and slowly change to red as they reach 100%, so I guess that's writelast10s.

Anyway, I'm wondering what rel_avr_dsk and worst_dsk means?

Looking at that picture, why are rel_avr_dsk and worst_dsk at 1000 req/s, but there is no read or write request... That status is observed most of the time. Sometimes, red line drop to 500 and then back to 1000. What does that mean?

Matej

_Gea · Oct 7, 2014

While read, write, wait values are absolute values from iostat,
the relative/average vs worst value are only indicating relative values scaled to width of graph.

On balanced pools with good disks average vs worst should behave similar.
If they are different it indicates that you may have a problem with a single disk.
(Don't look at absolute numbers only check for difference).

If values are different, check iostat for disks not pool overview, thats the only reason for that graph.
(Maybe I should add an extra comment to the graph that the relative values are not io/s but only relative values)

NOTORIOUS VR · Oct 7, 2014

Gea (and others), I currently run a setup that I would like to consolidate.

One box is just running OpenIndiana + napp-it:

-Core i3, 32 GB Ram (non-ECC - yeah, yeah I know)

OS is running on 2x 320GB drives in mirror

Z1 array for main data store/shares
6x 2TB WD drives (split in two Z1's of 3 drives ea.)
plus 1 Samsung 250GB SSD as cache

Dedupe on for certain zfs shares only.

2x 1TB WD reds running in mirror for ESXi VM's via iSCSI

Then I have a whole other box running just ESXi. (core-i7 w/ 24GB ram)

I feel I can save some complication/power by eliminating my ESXi box completely, but I don't want to compromise on power either. The Core-i3 is probably overkill for the ZFS server, but if I combine the two I might have an issue?

Currently the ESXi box runs mainly 3 VM's, 1 Windows Server 2010 for AD related stuffs, 1 Ubuntu box running 2 websites, Plex media server (incl. Transcoding when needed which will probably be the biggest CPU hog), SABnzbd + Sickbeard, etc, MySQL and lastly a VM that runs pfSense (to which I"m thinking of actually getting a dedicated appliance for in the future).

My main question I guess is this.

If I decide to go ahead with this, can I just move my zfs pools over to a new install of OpenIndina/napp-it? Is there anything I will have to do before (export zfs, etc?) or can I just shut down the server, wipe the OS and reinstall into a VM when I migrate ESXi over?

If I do go this route, upgrading to a more powerful CPU would probably be a good idea to handle the extra load of ESXi, for my VM's?

Also Gea, I see you finally offer an option for add-on's for home users! Fantastic!

Thanks!

_Gea · Oct 8, 2014

If you want to build All-In-One (ESXi with virtualized storage), you can
import the current pool (with or without prior export) when you are using
- pass-through with an extra SAS Controller like an IBM 1015 in LSI 9211-IT mode or an LSI 9207 or
- use physical RDM to pass-through single disks (controller pass-through is preferred)

The i3 with 32 GB RAM may be ok, depends on your load but mostly VMs are RAM limited, not CPU limited. Use at least 8 GB for your storage VM.

You may also think about a move from OpenIndiana to OmniOS as it is currently better maintained.
I would avoid dedup in most cases. You must recreate the pool to get rid of it.
I would use a single Raid-Z2 with 6 disks over two Z1 with 3 disks each.
IO of second is better but in first case you can loose any two disks.

For the VM datastore, think of a second SSD only pool (example a Mirror from 2 x 256 GB SSDs)

NOTORIOUS VR · Oct 8, 2014

_Gea said:
If you want to build All-In-One (ESXi with virtualized storage), you can
import the current pool (with or without prior export) when you are using
- pass-through with an extra SAS Controller like an IBM 1015 in LSI 9211-IT mode or an LSI 9207 or
- use physical RDM to pass-through single disks (controller pass-through is preferred)

I do run an LSI in IT mode for some of the disks, and I assume that It wouldn't matter which ports the disks are connected to as usual in this case when moving as I would then do as you say an passthrough the LSI controller.

The i3 with 32 GB RAM may be ok, depends on your load but mostly VMs are RAM limited, not CPU limited. Use at least 8 GB for your storage VM.

You may also think about a move from OpenIndiana to OmniOS as it is currently better maintained.

Noted thanks! Moving the pools between them is a non-issue?

I would avoid dedup in most cases. You must recreate the pool to get rid of it.

Honestly I think I only have it on my "user" zfs shares, It doesn't bother me and hasn't caused me any issues. So I will probably leave it be.

I would use a single Raid-Z2 with 6 disks over two Z1 with 3 disks each.
IO of second is better but in first case you can loose any two disks.

Is there a way to mirgrate to Z2 without losing data? I have no issues with throughput as far as I can tell... I max out my gigabit LAN connections with reading and writing (100+MB/s) so not sure if it's worth it. I do plan on adding a 4 port NIC to the server to add link aggregation just to avoid any future bottlenecks. But still the client's would be only connected with single 100 or 1000 connections.

For the VM datastore, think of a second SSD only pool (example a Mirror from 2 x 256 GB SSDs)

Maybe a good idea, although again the two WD red's seem to do a fine job. None of my VM's are slow to load really and once they're up they don't do much. But noted as well thanks.

_Gea · Oct 8, 2014

moving pools between OI and OmniOS: no problem
migrating Raid-Z1 vdevs to Z2 vdevs: not possible

NOTORIOUS VR · Oct 8, 2014

_Gea said:
moving pools between OI and OmniOS: no problem
migrating Raid-Z1 vdevs to Z2 vdevs: not possible

Fantastic... thanks for the tips.

shanester · Oct 9, 2014

I am currently running napp-it 0.9 f2 with Smartmontools 6.2
What is the command to update to Smartmontools 6.3?

_Gea · Oct 9, 2014

You can compile 6.3 yourself or you can
rerun the napp-it wget installer as its installs 6.3

Code:

wget -O - www.napp-it.org/nappit | perl

You need to update/activate 0.9f2 again afterwards.

shanester · Oct 9, 2014

_Gea said:
You can compile 6.3 yourself or you can
rerun the napp-it wget installer as its installs 6.3

Code:

wget -O - www.napp-it.org/nappit | perl

You need to update/activate 0.9f2 again afterwards.

Thanks!!!

levak · Oct 13, 2014

My NAS was working kiddna slow today, so I gave it a look. What I found was tons of errors being generated in my messages files. Errors were looking like this:

Code:

Oct 13 07:30:30 biglittle ahci: [ID 811322 kern.info] NOTICE: ahci0: ahci_tran_reset_dport port 7 reset device
Oct 13 07:30:35 biglittle ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 7 has task file error
Oct 13 07:30:35 biglittle ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 7 is trying to do error recovery
Oct 13 07:30:35 biglittle ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 7 task_file_status = 0x4041
Oct 13 07:30:35 biglittle ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 7 succeed
Oct 13 07:30:40 biglittle ahci: [ID 296163 kern.warning] WARNING: ahci0: ahci port 7 has task file error
Oct 13 07:30:40 biglittle ahci: [ID 687168 kern.warning] WARNING: ahci0: ahci port 7 is trying to do error recovery
Oct 13 07:30:40 biglittle ahci: [ID 693748 kern.warning] WARNING: ahci0: ahci port 7 task_file_status = 0x4041
Oct 13 07:30:41 biglittle ahci: [ID 657156 kern.warning] WARNING: ahci0: error recovery for port 7 succeed

I know scrub was running, so I checked it's status:

Code:

scan: scrub in progress since Mon Oct 13 03:00:01 2014
    112G scanned out of 5.34T at 6.85M/s, 222h18m to go
    18.5M repaired, 2.05% done

Yayks!

I went over to disk status in nappit and it reported 2562 hard erros...
Another yayks...

I head over to SmartCTL and checked it out:

Code:

  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       8626
  2 Throughput_Performance  0x0026   053   053   000    Old_age   Always       -       19631
  3 Spin_Up_Time            0x0023   067   067   025    Pre-fail  Always       -       10100
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       39
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       29684
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       61
181 Program_Fail_Cnt_Total  0x0022   100   100   000    Old_age   Always       -       2227335
191 G-Sense_Error_Rate      0x0022   100   100   000    Old_age   Always       -       73
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   059   000    Old_age   Always       -       32 (Min/Max 15/42)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   099   099   000    Old_age   Always       -       186
198 Offline_Uncorrectable   0x0030   252   252   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   100   100   000    Old_age   Always       -       21
200 Multi_Zone_Error_Rate   0x002a   001   001   000    Old_age   Always       -       33252
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       64  1

If I compare results from other drives in the same pool, this is where they differ:
- failing drive has around 8000 raw_read_erorr_rate compared to other, who has 4000
- failing drive has 21 UDMA_CRC_Error_Count, other drives 0
- failing drive has 180 Current_pending_sectors, other drives 0

What do you think the problem is?
Failing drive?
Faulty SATA cable?
Faulty SATA connector?
All drives are Samsung F4 EG with about 3,5 years of power-of hours...

In case I have to change hard drive, what do you guys recommed? Hitachi Deskstar 7K2000 would be a good choice?

Matej

Mastaba · Oct 13, 2014

_Gea said:
I would reset the ACL of the filesystem using /usr/bin/chmod or napp-it ACL extension to a everyone@=modify, root=full or a basic root=full setting and then try again. Do not touch Unix permissions like 750 as they reset ACL inherheritance settings. Aclmode must be passthrough to modiy ACL in both cases.

The ACL everyone@ is not a user but refers to any known user - its a group that is similar to the Unix permissions for everyone.

How do you do reset the ACL?
chmod -A ?

_Gea · Oct 13, 2014

Remove all ACL (reset to default with force, recursive):
something like /usr/bin/chmod -Rf A- path

read Solaris ZFS Administration Guide for Solaris Express 11, page 224
http://docs.oracle.com/cd/E19963-01/pdf/821-1448.pdf

TCMY95 · Oct 17, 2014

Anyone, I am kinda worried about a disk in the Zpool, in nappit it shows under the error column in the disks menu. One disk shows this S:1 H:10 T:28, this was showing as S:1 H:10 T:24 yesterday.

Could this mean the disk is on it's way out?

_Gea · Oct 17, 2014

Iostat messages are not real errors like checksum ZFS errors, more a sort of driver warnings. Your values are quite moderate. Especially on performance problems they indicate weak disks. On high values (several hundreds) I would remove the disks and do a low level test with a manufacturers tool or check power and cabling.

J-san · Oct 18, 2014

Hi there Gea,

First of all, thanks for all your hard work and the information that you put out there for your all-in-one, hardware, software configurations and examples!

On your all in one page, you recommend installing ESXi and the local OmniOS data store on a hardware RAID1 such as the Raidsonic SR2760-2S-S2B.

I was wondering, on that specific enclosure is it possible to take a single drive out and boot from one of the Raid1 drives directly if the hardware SR2760 itself fails? Or are you left with a disk that will only work in that particular raid unit.

Thanks!

_Gea · Oct 18, 2014

I have not tried for a long time but I would expect that it boots a single disk from Sata

You can use the unit in two ways:
- Mirror to protect against a disk failure.
While this works very well, I have had problems when disks got bad blocks.
In such a case both disks can become corrupted.

- Backup (I mostly use it for this)
From time to time i hot-sync both disks. After this, I remove the second disk to a have a ready to use backup boot disk.

You can mix both variants with 3 disks.

J-san · Oct 18, 2014

Thanks for the info!

Just was wondering about the ability to boot from eg the mobo sata without raid in the event the hardware raid 1 unit failed, as I would hate to have two good disks and be left not able to boot either because the enclosure failed.

The backup method and swapping the regular 2nd disk for a third to get a backup, then swapping the 2nd sounds like a good idea.

Do you mostly use disks, or ssd for these? I suppose a backup with ssd normally connected and a backup of a similar sized hard disk, which is then removed might be ideal.

A second enclosure that you could clone the backed up raid1 disk to a non-raid disk might be a good idea, if it won't boot from outside the enclosure.

_Gea · Oct 19, 2014

J-san said:
Do you mostly use disks, or ssd for these? .

Both but prefer SSDs.
Does not really matter regarding performance but SSDs are cheap.
Its also a way to reuse older SSDs.

Konowl · Oct 19, 2014

Figured it out.

Konkonex · Oct 25, 2014

Wonder if anyone can assist with an issue I'm experiencing.

I have a HP N54L with 6x 3TB drives and 1 x usb storage drive as the boot.

There are two drives sitting on top of each other at the top of the case where the optical disk drive should go - one connected to on board sata, one connected to ESATA on the rear.

The problem it seems to have is during heavy tasks - the two connected to sata and esata appear to often hit 100% busy/wait whereas all the other disks will be significantly less.

I've tried this with different disks/different tasks/etc.. but I get the same issue.

My thought is that the ESATA and the SATA port somehow share the same path somehow resulting in half speeds on both drives?

Or that it's because I'm using a molex to 2x sata splitter to power both these drives.

Any thoughts on this?

edit - I'm using openindiana - raidz2 - but have used raidz1 and get the same issue with both configurations.

dave99 · Oct 25, 2014

There isn't a whole lot of room up there or airflow, perhaps they are getting too hot?

Konkonex · Oct 25, 2014

Don't think heat is the problem - I've had this completely shut down and cold - then tried from cold boot and still had the same problem.

Could vibrations cause this (they are sitting on each other)?

As for internal space - its using 10% of it's space.

Also the case is open/drives aren't very hot to touch- the system is held in a cool environment.

dave99 · Oct 25, 2014

I'd think vibration/harmonics could do it also, perhaps get 4 rubber feet of some sort and try them in between the drives and see if it changes anything.

http://www.24hourdata.com/blog/how-...ntributes-raid-failure-and-need-raid-recovery

Konkonex · Oct 25, 2014

I can get some rubber feet for both drives for a permanent solution - in the mean time I can try using foam dividers- and see if that does it.

levak · Oct 25, 2014

Or maby put the esata drive outside of the case for a test...

Konowl · Oct 29, 2014

What is the best way to reset my permissions on shares? For some reason the sub folders in one of my shares only have read access, and the reset acl's don't seem to function the way they used to in previous versions of Napp It.
In fact, NONE of the ACL stuff seems to work in the latest version.

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Supreme [H]ardness

n00b

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Weaksauce

Supreme [H]ardness

Weaksauce

Limp Gawd

Limp Gawd

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

n00b

Supreme [H]ardness

Limp Gawd

n00b

2[H]4U

n00b

2[H]4U

n00b

Limp Gawd

Limp Gawd