OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

_Gea · Oct 29, 2014

Complex ACL management is part of a nonfree extension.
A basic free reset functionality for everyone=modify from the last napp-it is not available in current release but will be back in the next default edition.

With current release you can
- use /usr/bin/chmod
- set ACL from Windows
- in single cases, use an evalkey from http://napp-it.org/extensions/evaluate_en.html
that enables all options for two days or use a Pro home licence

update:
A preview for next default 0.9f3 (Base for next Pro and Free) that includes all fixes of the last two months is freely available.
http://napp-it.org/downloads/changelog_en.html

tsrtg · Oct 30, 2014

If I am using Solaris 11.2, which version of LSI 9211-i8 IT-mode firmware should I flash? Should I flash the latest P20? I see that in the FreeNAS forum people say that firmware version should always match driver version. Is it true for Solaris as well? On the LSI web site there are no drivers for Solaris. How can I check which version of the drivers is in use?

_Gea · Oct 30, 2014

This "use P16 firmware discussion" is mainly a BSD item

The driver for LSI is included and part of Solaris and OmniOS
I would always prefer the newest LSI firmware unless not told to use another (by Oracle, Illumos or OmniOS)

TCM2 · Oct 30, 2014

I'd like to see a source where this "FW must match driver in BSD" comes from.

jmk396 · Oct 30, 2014

I've been having a lot of problems with my ZFS system (Solaris Express) hosted inside ESXi 5.5 lately.

Here are some of the errors I'm getting: (in the Solaris console)

Code:

vmxnet3s:0: Failed to allocate 1518 bytes for rx buf, err:-1.
WARNING: vmxnet3s:0: ddi_dma_alloc_handle() failed

Code:

WARNING: /pci@0,0/pci15ad,7a0@16/pci1000,3020@0  (mpt_sas0):
   Unable to allocate dma handle for extra SGL.

   MPT SGL mem alloc failed.

This seems to happen when I copy large files, etc.

I'm using two SAS cards using PCIe passthrough in ESxi.

Can anybody help me? Is this a sign of bad memory? (I'm using ECC memory) Are my SAS cards going bad?

_Gea · Oct 30, 2014

- Can you add some infos about your ESXi version (ex 5.5 U1)
and whether you use NFS or iSCSI.

- What mainboard, nic and SAS Adapter?

- Any special settings in vmxnet3?

- Have you alternatively tried e1000

jmk396 · Oct 30, 2014

I'm running ESXi 5.5 Update 2 (build 2143827) on a Supermicro X9SCM motherboard.

The SAS cards are 2x LSI 9211-8i.

ZFS is running on Solaris Express and ESXi connects using NFS to mount the virtual machines. However, my Solaris Express OS runs directly on a physical drive (so it can export the NFS mounts for the other virtual machines).

I haven't tried the E1000 driver because I thought VMXNET3 is the best choice since ZFS handles 100% of my NAS.

_Gea · Oct 30, 2014

Vmxnet3 is the fastest option but can give problems.
I would first try e1000

If the problem is new with 5.5U2, you can go back to 5.1U1 (very stable)
On my own configs, I use 5.5U2 without problems up to now

Beside that, a hardware problem, mainly RAM is possibly.
Remove half of the RAM, try and check then with the other half.

At any time you must think about a newer OS (Express is 2011)

jmk396 · Oct 30, 2014

What are my options for upgrading and not having to rebuild my ZFS pools?

Unfortunately, one of them is ZFS ver 31 so I don't think I can use illumos, etc. I really wish I didn't upgrade to version 31 because it's a very large array (20 TB) that I can't figure out a way to get everything off so I can recreate it from scratch.

_Gea · Oct 30, 2014

Options: None

You must decide if you follow Oracle (ZFS v.29 and up) or OpenZFS (v.5000 with feature flags).
There is no compatibility above ZFS v.28/.5 - the last free ZFS version!

Any switch between means a destroy and rebuild pool.

paniolo · Nov 1, 2014

hi Gea,

Wondering if you have a roadmap for Napp-IT? planned new features etc.?

TCMY95 · Nov 1, 2014

Firstly I apologies for this lengthy post.

We implemented 2 zfs boxes recently and whilst I did a lot of research before actually deciding on the final build and OS I am no where near an expert on zfs. In fact recent events have highlighted how little I know about this. So it is time to step up my game.

We had a disk fail in one of the systems running RAID z1, as we did not have a hot spare it just ran in a degraded state. The disk was still showing as online under disks but showing as removed under the zpool so I brought the disk back online and the disk started to resilver. I then bought another disk promptly and configured as a hot spare. Sure enough the problem disk went again and the hot spare started to resilver. During this time the problem disk was in a "too many errors" state but then went to a "removed" state after some time. I have 2 questions on this really.

1. When the disk moved to a removed state it locked up all the VM's in Xen server forcing them to need dirty reboots. Why would a single disk cause this issue.

2. Now that the hot spare has reslivered the pool still shows as degraded. Am I better using replace on the faulty disk replacing it with the hot spare which has already resilvered and then place a new disk as the hot spare. Or do I just replace the faulty disk with the new disk? My thoughts on the latter is that it will have to resliver again?

Hope that all makes sense? thanks to all.

TCMY95 · Nov 2, 2014

OK I think I have found the answer to my own question number 2, but just to confirm my thoughts after reading up we should replace the faulty disk and let the spare disk (which is currently in use) return to a hot spare?

On another note I think I may had made some errors setting this up in the first place so we may want to buy identical hardware and start again so if anyone has advice on how we should set up then it would be gratefully received.

Current setup.

24 bay super micro chassis - 2u type A
Supermicro MB x9srl-f
Xeon 6 core processor
32GB ecc registered RAM
Intel 10GB dual port network
20 WD Red drives (1vdev raid z1 9 disks with one hot spare, 1 vdev Raid z2 10 disks no hot spare)
Each of the vdevs has Intel 100GB S3700 SSD LARC and Samsung 840 Pro 128GB SLOG.
Total disk bays uses 24

We are booting from USB3 32GB 80MBsec read and write, I have done some more reading and this does not appear to be advisable.

If we do the same build again are we better going for a RAID z2 with 16+2 WD reds, or something else.

thanks ins advance.

levak · Nov 2, 2014

I wouldn't go with raidZ1 at all. Never

There is too much chance for the 2nd hard drive to fail during resilvering, specially now with big hard drives. At least RaidZ2 oz 3, if you can afford.

I would put S3700 as SLOG, since latency matters and 840Pro is nowhere near S3700 AFAIK. 840 is OK for L2ARC, since latency is still a lot better then with spindels...

As far as RAIDz2 does, it's better to do many small arrays. You loose some space, but resilvering time is faster and that makes it a safer array

I think it took around 30h to resilver my 4TB array of 4x2TB drives...

Matej

TCMY95 · Nov 2, 2014

Levak thanks for your reply, sorry I had got the SSD's the wrong way round they are in fact configured the way suggested.

On the vdev are you saying we would be better runing more 4+2 arrays, so for instance 3 x 4+2 arrays = 18 disks total. That would then mean 3 x S3700 drives for the SLOG and 3 x Samsung or something else for the L2ARC. Or is there a more efficient way of using the SSD's which wont impact performance?

TCMY95 · Nov 2, 2014

Reason I ask above question is I don't fancy running without hot spares?

levak · Nov 3, 2014

You could probably go with 2xraidz2 (9+2) + hotspares...

Instead of l2arc, you can add more memory and use that for l2arc. That way you can use SSD for something else. Maby RaidZ2 array of SSDs

tsrtg · Nov 3, 2014

Why create 2xraidz2 + 2 hot spares when you could just use those drives to create raidz3. I think raidz3 is always safer than raidz2 + hot spare, just as raidz2 is always safer than raidz1 + hot spare.

_Gea · Nov 3, 2014

@paniolo
more ideas than time...

@IGC
- A hotspare disk remains a hotspare disk even when you use it.
-> Replace a faulted disk with a new one

- Raid-Z2 + hotspare is a good idea if you have (or plan) more than one vdev
as a hotspare is a per pool setting. If your pool is build from one vdev, Z3 is a better idea.

- about timeouts
ZFS waits quite a long time until it expects an error. With a pure filer this is mostly ok. This can be a problem with a Hypervisor that expects an answer within a certain time (I expect a timeout with ESXi after 180s). As far as I know, Nexenta introduces a limit just like with TLER disks that do this per disk firmware.

At the moment I would say, that one should use TLER enabled disks with Hypervisors althouth ZFS does not need them (unlike hardware raid)

levak · Nov 3, 2014

Is it normal for scrub to take long time to finish and that average scrub speed it quite low. Currently, it's reading 32MB/s:

pool: pool0
state: ONLINE
scan: scrub in progress since Mon Nov 3 03:00:02 2014
1.92T scanned out of 4.80T at 32.9M/s, 25h31m to go
0 repaired, 39.98% done
config:

NAME STATE READ WRITE CKSUM
pool0 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
c1t6d0 ONLINE 0 0 0
c1t7d0 ONLINE 0 0 0

errors: No known data errors

Transfer speed from Zabbix also looks weird. There is quite high read speed from 3am when scrub started until 3pm, but then it drops significantly. What could that be?

Matej

levak · Nov 3, 2014

Nevermind, that weird graph is from XBMC making backup...

Matej

_Gea · Nov 3, 2014

I would expect:

- scrub is a low priority process
- scrub performance lowers with number of (small) files
- scrub performance depends on overall pool performance

levak · Nov 4, 2014

I guess small files are the cause... There is thousants of small files in XBMC backups and they take up to a few GB, so...

Also, pool is quite full (25% free left), so that might be the case as well...

Matej

J-san · Nov 4, 2014

RE: Which version of LSI 9211-i8 IT-mode firmware should I flash? Should I flash the latest P20?

I just put together a system with three LSI 9211-8i cards and flashed them to P20, and ran into problems on the latest OmniOS during benchmarking with bonnie++.
A single SSD (or multiples) connected via the LSI 9211-8i cards were throwing errors in the console and /var/adm/messages.

This was via the SSD drive on a non-expander backplane. To rule out the backplane, I hooked a single Intel S3500 480GB SSD up to a breakout cable but no improvment - the same errors show up in the root console and log..

OmniOS 5.11 omnios-8c08411 2014.04.28
LSI 9211-8i cards - Flashed to P20 firmware.

Nov 4 09:39:30 napp-it-14b Log info 0x31080000 received for target 9.
Nov 4 09:39:30 napp-it-14b scsi_status=0x0, ioc_status=0x804b, scsi_state=0x0
Nov 4 09:39:30 napp-it-14b scsi: [ID 107833 kern.notice] /pci@0,0/pci15ad,7a0@16/pci1000,3020@0 (mpt_s as1):
Nov 4 09:39:30 napp-it-14b Aborted_command!
Nov 4 09:39:30 napp-it-14b scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0):
Nov 4 09:39:30 napp-it-14b /scsi_vhci/disk@g55cd2e404b58b0a7 (sd2): Parity Error on path mpt_sas14 /disk@w55cd2e404b58b0a7,0

Arg. Transport errors and hard errors looked something like this:

iostat -En
-------------------
c8t55CD2E404B58E103d0 Soft Errors: 0 Hard Errors: 1028 Transport Errors: 870
Vendor: ATA Product: INTEL SSDSC2BB48 Revision: 0370 Serial No: BTWL404501HQ480
Size: 480.10GB <480103981056 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

So I took to IRC.. and found some advice on #illumos. Someone there (patdk-wk_) mentioned he runs P19.

Well, lets give it a try - So I downgraded by re-flashing all 3 HBAs to P19 and rebooted.

No more errors during benchmarking!!!!

So if you have a LSI 9211-8i hba I would stick with P19.
There seems to be some error with P20 and these cards.

(Also, someone on IRC also mentioned 9201-16e having problems with P20)

Hope this saves someone else some time.

Note: I also had a striped mirror of 2TB WD Re drives that also displayed some hard + transport errors with P20, but much less than the SSDs. (only 2 or 4 per WDRe drive after a benchmark run)
So it seems the faster speed of the SSD was triggering the problem more frequently on P20.

agheno · Nov 5, 2014

Although tech-inclined in general, I am not too familiar with Solaris so I thought I'd ask for advice on how to get napp-it running again before poking around further and possibly making it worse.

After running relatively well for a few years, ESXi got corrupted and I had to do a re-install. I ended up installing a newer version of ESXi (I think going from 5.1 to 5.5), and was able to preserve the existing datastore with my nappit VM. After booting ESXi, I re-imported the napp-it VM from the datastore, tried to boot it, and encountered an error, "The systemId does not match the current system or the deviceId and vendorId do not match the device currently at <bus:device.function>"

Google says this has to do with the device id's associated with hardware pass-through and the solution is to edit the VM configuration, remove the device, and add it again. No biggie. A few clicks later, OpenIndiana is starting to boot, but then another problem shows up:

Code:

(...truncated...)
module$ /platform/i86pc/$ISADIR/boot_archive
loading '/platform/i86pc/$ISADIR/boot_archive' ...
checksum verification failed

Error 16: Inconsistent filesystem structure

Grub has two boot options and both generate the Error 16.

OpenIndiana Development oi_151a X86
pre_napp-it-0.6i_update_11.22

I've tried a few things, including removing the hardware passthrough from the VM configuration, but nothing is working yet.

For reference:
The VM is stored in the ESXi datastore, on a 60gb SSD. The SAS2008 HBA card is passed through to the VM, along with the 14 harddrives attached to it.

The server has the following parts:

Supermicro X9SCM-F-O
Xeon e3-1230
16 gb ECC unbuffered RAM
Supermicro AOC-USAS2-L8i (SAS2008 based HBA)
Chenbro CK23601 (SAS Expander)
1 x 60gb SSD (ESXi drive/datastore)
4 x 500gb
10 x 2tb
OpenIndiana 151a
Napp-it 0.6 (I think)

Now it really all boils down to the data on the 14 drives, so I'm thinking maybe setup a new instance of solaris/nappit and import the drives. I'm a little concerned about the fact that I don't remember how the pools were configured, combined with the fact that I'm not that great with Solaris. Any thoughts?

If I end up setting up a new VM and importing the physical drives there, do I need to match the solaris and nappit version levels or is there backwards compatibility?
Am I going to need to know the pool configuration or is ZFS smart enough to detect and realize there are existing ZFS pools on those drives?

_Gea · Nov 6, 2014

It seems that your local datastore with OpenIndiana got corrupted.
Your new setup can be like:

- Install ESXi 5.5U2 with pass-through for the HBA
- download a ready to use VM (OmniOS + napp-it) from http://napp-it.org/downloads/index_en.html
and upload the VM to your local datastore with the help of the ESXi filebrowser
- import th VM (filebrowser: right-click on .vmx file)

- start OmniOS (this is very similar to OI server but stable and up to date, expects a DHCP server for auto-ip configuration)
- import your pool (all pool data and configurations, shares and ACL are stored on a pool)
- optionally re-create users and jobs

- update napp-it (menu about-update)
- update OmniOS (optional), see http://omnios.omniti.com/wiki.php/ReleaseNotes/r151012
its basically a switch to the newest repository and a pkg update

This is all straight forward without critical points for your data and you end with a up to date configuration.

_Gea · Nov 6, 2014

J-san said:
RE: Which version of LSI 9211-i8 IT-mode firmware should I flash? Should I flash the latest P20?
..

So if you have a LSI 9211-8i hba I would stick with P19.
There seems to be some error with P20 and these cards.

Have not tried P20 but this info can is very valuable on problems.

ST3F · Nov 6, 2014

@ _Gea : how to tweak Prefetch on OmniOS / Napp-it ?
... is it only in upper right with with Console mode in the web interface ?
( I would like to improve ZFS configuration for heavy video files http://weblog.etherized.com/posts/185.html for Avid Media Composer 8.2 in AMA )

Is your video editing server with SSD being ready and set up ?

Cheers.

ST3F.

agheno · Nov 6, 2014

_Gea said:
It seems that your local datastore with OpenIndiana got corrupted.
Your new setup can be like:

- Install ESXi 5.5U2 with pass-through for the HBA
- download a ready to use VM (OmniOS + napp-it) from http://napp-it.org/downloads/index_en.html
and upload the VM to your local datastore with the help of the ESXi filebrowser
- import th VM (filebrowser: right-click on .vmx file)

- start OmniOS (this is very similar to OI server but stable and up to date, expects a DHCP server for auto-ip configuration)
- import your pool (all pool data and configurations, shares and ACL are stored on a pool)
- optionally re-create users and jobs

- update napp-it (menu about-update)
- update OmniOS (optional), see http://omnios.omniti.com/wiki.php/ReleaseNotes/r151012
its basically a switch to the newest repository and a pkg update

This is all straight forward without critical points for your data and you end with a up to date configuration.

Thanks for the quick reply Gea.

_Gea · Nov 6, 2014

ST3F said:
@ _Gea : how to tweak Prefetch on OmniOS / Napp-it ?
... is it only in upper right with with Console mode in the web interface ?
( I would like to improve ZFS configuration for heavy video files http://weblog.etherized.com/posts/185.html for Avid Media Composer 8.2 in AMA )

Is your video editing server with SSD being ready and set up ?

Cheers.

ST3F.

There is no tuning option within the Web-GUI.
You can enter commands in the napp-it command form (run as root) or via CLI at the console or via putty. Modifications of /etc/system can be done for example via WinSCP (allow root access in menu services-ssh).

The SSD storage is up and running but I am low on time to do some testings as the new video-lab (Adobe CC) is located in a university branch where we wait for a 1/10G connectivity (currently Telekom DSL 10Mbit)

ST3F · Nov 6, 2014

_Gea said:
There is no tuning option within the Web-GUI.
You can enter commands in the napp-it command form (run as root) or via CLI at the console or via putty. Modifications of /etc/system can be done for example via WinSCP (allow root access in menu services-ssh).

Ok.

Can you tell me the command line to enable or disable Prefetch on OmniOS ?

Cheers

jmk396 · Nov 6, 2014

I'm using Solaris Express 11.1 with ZFS (Napp-It) and export my files using NFS and SMB.

How can I perform some simple NFS versus SMB tests from a client machine on my network?

I'd like to test sequential read performance and seeking performance (to optimize for video playback), etc.

_Gea · Nov 6, 2014

ST3F said:
Ok.

Can you tell me the command line to enable or disable Prefetch on OmniOS ?

Cheers

Have not done myself.
Maybe like

Set the following parameter in the /etc/system file:
set zfs:zfs_prefetch_disable = 1

or read http://docs.oracle.com/cd/E26502_01/html/E29022/chapterzfs-4.html

_Gea · Nov 6, 2014

jmk396 said:
I'm using Solaris Express 11.1 with ZFS (Napp-It) and export my files using NFS and SMB.

How can I perform some simple NFS versus SMB tests from a client machine on my network?

I'd like to test sequential read performance and seeking performance (to optimize for video playback), etc.

You can mount the same share from a client that supports both (Linux, Unix, OSX or Windows with NFS client software) and run some benchmarks like dd, bonnie or Chrystaldisk on Windows.

Expect different results depending to the client platform.

jmk396 · Nov 6, 2014

Thanks!

One more quick question...

I'm currently using a USB memory stick for ESXi and a Western Digital Raptor (older 74 GB) drive for ZFS (using Solaris Express 11.1).

If I purchased an SSD (Samsung 840 120GB), could I partition it and use it both for ESXi and ZFS (Solaris or whatever)? Would there be any performance penalty or anything like that?

_Gea · Nov 7, 2014

Foe all-in-one setups, you do not need to partion your boot ssd.
Install ESXi on it and use the rest as a local datastore for your storage VM

Your minimal SSD size should be 32GB. Performance is uncritical, does not matter to NAS use at all. Use anything reliable. A SSD with a supercap to ensure valid writes on a powerfailure would be nice.

jmk396 · Nov 7, 2014

Thanks Gea! Do you know of the cheapest SSD that fits your description? (32GB (or larger) SSD with a supercap?)

I've already ordered a Samsung EVO 840 120GB, but I'd be willing to exchange if something better exists that isn't incredibly expensive, etc.

_Gea · Nov 7, 2014

jmk396 said:
Thanks Gea! Do you know of the cheapest SSD that fits your description? (32GB (or larger) SSD with a supercap?)

I've already ordered a Samsung EVO 840 120GB, but I'd be willing to exchange if something better exists that isn't incredibly expensive, etc.

If you are looking for best reliability and a supercap on current SSDs,
I would prefer an Intel S3500-80

(Low capacity or cheaper SSDs with supercap are mostly quite old like Intel 330 or expensive SLC)

jmk396 · Nov 7, 2014

Thanks!

I'm really sorry but I have one last question (I hope)...

I'd like to upgrade my LSI 9211-8i cards to the latest firmware. (because I've been having problems and I'm on an older firmware [v11.00.00.00])

Can I just upgrade firmware without losing any data?

_Gea · Nov 7, 2014

jmk396 said:
Thanks!

I'm really sorry but I have one last question (I hope)...

I'd like to upgrade my LSI 9211-8i cards to the latest firmware. (because I've been having problems and I'm on an older firmware [v11.00.00.00])

Can I just upgrade firmware without losing any data?

A firmware does not affect data unless the firmware is not buggy
According to http://hardforum.com//showpost.php?p=1041207949&postcount=6384
I would use P19 (P20 is newest)

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Supreme [H]ardness

n00b

Supreme [H]ardness

Gawd

Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

n00b

n00b

n00b

Limp Gawd

n00b

n00b

Limp Gawd

n00b

Supreme [H]ardness

Limp Gawd

Limp Gawd

Supreme [H]ardness

Limp Gawd

n00b

n00b

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

n00b

Supreme [H]ardness

Limp Gawd

Gawd

Supreme [H]ardness

Supreme [H]ardness

Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness

Gawd

Supreme [H]ardness