OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

ZFS is controller independent software raid.
You can move disks between any controller (Sata, SAS). It does not matter if you use a backplane or expander.
ok so for to be 100% safe power up the solaris server whit napp it white down the HDD ID power off the server and swits the hdd so it can handel a lost of a backplate and power up and import it and that just that ?:)
 
Thanks Gea:)

Now almost everything works on my server. NFS speed is relativly good and fast.

I do have another problem though. File creation/delete is really slow. If I test with bonnie++ directly on Omnios, it can create 20000/s and delete 26000/s. If I do the same test on my server, I only get 1800/3000. That is with sync disabled. If I set it on default, I get results around 60/80.

Why such a big difference? Is that normal?

UPDATE: I tried to run the same test on one of my AIO with RAID10 and I get almost the same results... I guess that's as fast as NFS can go.

Matej
 
I apologize if this has been asked before, but I've tried to scan quite a few possible avenues before I posting my question here.

I've recently setup a new OmniOS based NAS using an IBM M1115 (flashed to IT mode) with 3 x 2 TB drives. The server has 40 GB ECC RDIMM memory with 2 x Xeon 5520 processors on a Asus Z8NA-D6C motherboard (I plan to convert this into an AIO in the future).

Raw dd performance writing an 82 GB file to the pool is decent :

Code:
root@OMNI-NAS:~# dd if=/dev/zero of=/storage/dd.tst bs=4096000 count=20000
20000+0 records in
20000+0 records out
81920000000 bytes (82 GB) copied, 325.804 s, 251 MB/s

However when coming over NFS (from an older Solaris 11 box) I'm only getting ~ 22 MB/s write performance.

I have disabled sync on the dataset I'm writing to :

Code:
root@OMNI-NAS:~# zfs get sync storage/vmfs
NAME          PROPERTY  VALUE     SOURCE
storage/vmfs  sync      disabled  local

I have not tuned any other settings. At this point I'm at a loss as to why I'm getting so poor NFS performance.

My ipref test between the client and the server also gives decent throughput :
Code:
root@OMNI-NAS:/var/web-gui/data/tools/iperf# ./iperf -c 172.16.1.42
------------------------------------------------------------
Client connecting to 172.16.1.42, TCP port 5001
TCP window size: 48.0 KByte (default)
------------------------------------------------------------
[  3] local 172.16.0.191 port 38443 connected with 172.16.1.42 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.01 GBytes   870 Mbits/sec

I would really appreciate it if someone can provide me some directions on how to troubleshoot my NFS's poor performance.

Thanks in advance.
Groove
 
Could you try NFS transfer from another box? Maby you can boot up a live linux and benchmark the NFS in linux?
Try benchmarking CIFS from windows, just to rule out networking issue.

I know iperf is saying network is OK, but I had the same problems one but forgot what was the case.

Matej
 
Thanks Matej,

going from another linux server works fine and I get ~ 112 MBps.

Thanks for the suggestion. Going to configure the box as an AIO and see how that pans out.

Groove
 
Hey guys, been running various different versions of forked Solaris (Open Indiana, OmniOS, etc) with Napp-it for awhile now and had no issues, well for the most part.

Let me provide some background and then the problem. I'm not looking for hand-holding, but a point in the right direction would definitely help me.

Right now I've got a fairly large lab/production environment that runs on a 24 bay chassis, 24GB RAM, dual core AMD, IBM m1015's flashed to IT mode for full passthrough and a QLogic QLE2464 FibreChannel HBA. This connects via 4GB FC to a Dell R610 that runs multiple VM's, there are no FC switches in between (the two are set to Point To Point mode)

I've noticed that (only on the console screen) under heavy load via FC that the link will drop between the SAN and the R610, showing a message about link down, then link up, and then usually a message about a firmware dump cannot be performed as there is already one outstanding. When this happens the R610 goes crazy as the LUN's go offline, causing the VM's to freeze. After a few seconds the port comes back up and all is good for a little while longer.


Unfortunately I couldn't find any results with my Google-Fu on the specific error about the firmware dump, nor any real tangible information about the link flapping under load. I originally though my FC cards were bad, so I swapped with different cards, then tried different cables, upgraded the BIOS on the QLogic's and still nothing.

Any thoughts? This doesn't seem to happen with iscsi, just FC. I'm fairly certain that Solaris is requesting the firmware dump, causing the port to go offline briefly, but I'm not sure why. I've also tried different forks of Solaris (OmniOS, Open Indiana, and now Solaris 11.1).

I'm hoping someone has experienced this before and found the problem.

Thanks for any help! Even a point in the right direction! I've been battling this for weeks now.
 
Just want to give this thread the heads-up on esxi 5.5 u1. There is major NFS issues with this release that vmware is still working on. Given the NFS-centric nature of gea's napp-in-one's i think its a safe assumption that people remain at 5.5 GA for the time being

http://kb.vmware.com/selfservice/mi...nguage=en_US&cmd=displayKC&externalId=2076392

If you are patching due to concern about heartbleed see here: http://blogs.vmware.com/kb/2014/04/patching-esxi-5-5-heartbleed-without-installing-update-1.html
 
Thanks for the link to the kb article.
I have published a warning about 5.5u1 on my download page as well.
 
Hey Gea!

I'm trying to write a Zabbix template for smart monitoring hard drives and I stumbled upon a problem. How do I get a list of hard drives in OmniOS?

In linux, I can check /proc/diskstats or lsblk, what can I use in solaris?

Also, I just ran a smartmon tools in nappit and all my soft errors got raised by 1. Is that normal?

Matej
 
Hey Gea!

I'm trying to write a Zabbix template for smart monitoring hard drives and I stumbled upon a problem. How do I get a list of hard drives in OmniOS?

In linux, I can check /proc/diskstats or lsblk, what can I use in solaris?

Also, I just ran a smartmon tools in nappit and all my soft errors got raised by 1. Is that normal?

Matej

1.
you can use iostat but thats more like a inventory.
If you remove a disk it remains in iostat

I use iostat for a basic disk overview and format and parted when partition support is enabled to discover removed disks and details.

2.
Current smartmontoools raise soft error counter in iostat on every health state check on Solaris.
Thats normal
 
Last edited:
Is there any command to just show current attached drives?
On other hand, to be honest, I don't normally switch drives in and out on a daily basis:) I usually ADD drives:)

What smart command are you using to read data of drives?
smartctl -a -d sat,12 /dev/rdsk/c1t4d0?

I get list of drives with
iostat -nx | awk '{print"/dev/dsk/"$11}'

How can I exclude pools and the rest. I have no idea on regex, what would be an expression for:
*c'number't'number'd'number'*

That way I can remove all entries except hard drives...

Matej
 
You can use:
iostat -nx | grep -E 'c[1-9]t[1-9]' | awk '{print"/dev/dsk/"$11}'

This will find any disks that were known from last bootup included failed or removed.

about smart.
you can compare my script /var/web-gui/data/napp-it/zfsos/_lib/illumos/get-disk-smart-bg.pl
where I first detect the type of disk to use the proper command.
 
I'm setting up a new omnios based NAS (39TB raw), wanted to get apcupsd going with a USB connection. Not as simple as it used to be, as omnios doesn't come with libusb nor is it in their pkg repo. Thought I would write down the basics on how I got it working for future ref, you'll need to download apcupsd source ahead of time, as well as have the various build tools like gcc, gmake already installed.

1) install openindiana pkg repo
pkg set-publisher -O http://pkg.openindiana.org/dev openindiana.org

2) install the libusb packages
a) pkg install libusb libusbugen
b) touch /reconfigure
c) reboot

3) configure & make apcupsd like normal
a) ./configure --enable-usb ...whatever other options
b) gmake etc

4) Not sure if everyone will have to do this step, when I plugged in the apc ups, it was still going into hid mode (which won't work), not ugen mode. You can force it into ugen mode with the following:
a) prtconf -V (find the usb drive id, see the format below)
a) rem_drv ugen
b) add_drv -i '"usb51d,2.106"' -m '* 0666 root sys' ugen

The usb id (usb51d,2.106) might vary by ups device, I'm not sure.

check your logs, you should see the UPS attached via ugen. Check that you can connect to it with apctest, configure apcupsd.conf and you should be good to go.
 
Not really a napp-it kind of questions, but I would just like to hear your input on my thinking...

Ok, so I got SMART temperature monitoring up and running on Solaris.

Now I want to do some disk performance monitoring. I can use Zabbix agent, but I can only get bytes/s and IOps, but I want to get more info. What would be the best command?

Iostat? I guess that would be nice, since it can provide me with IOPS, bytes/s, response time of transactions, queue wait % time, disk busy % time,...

Since Zabbix can only return one value at a time, I will probably have to put iostat in crontab and save the output. I will then use some bash scripting to get my data out. Since Zabbix only gathers data every 30 or 60s, I thinking of running iostat every 30s for 25s. iostat probably isn't resource hungry, so running it would have any impact on my system, right?

Is that the right approach or would someone take another path?

Matej
 
No problem.

On my new napp-it 0.9f1 from yesterday I ran arcstat, iostat, fsstat, nicstat, poolstat, prstat, zilstat simultaniously and constanly in the background. Even with the additional load of the websocket server to process the data in realtime and additionally enabled agents to update disk, zfs and snaps in the background to improve GUI performance and a zpool scrub and copying large files over the net the overall busy rate is quite low (Okay its a fast Xeon but should not be a problem wirh a slower CPU)

09f1.png
 
Last edited:
Ok, great info:)

So if I want to show disk performance, iostat is probably the best option, right?
If you can only get statistic once per minute, would you take minute average or minute maximum? I guess average makes more sense, and that is the way other monitoring tools(at least munin) do it. They look in /proc/diskstats and divide that by time.

If I want to log "per share" activity, fsstat is the command to use?

Matej
 
Iostat is perfect for disk activity. I use average last 10s/60s and would prefer average over peak values.
Fsstat is activity per filesystem type ex zfs, nfs, tmp
 
So, as I was saying, I'm building Zabbix templates for systems I maintain and want to monitor. Zabbix provides a basic template for Solaris, but it only monitors amount of free/used space, memory consumption, CPU usage, network usage.

But there have been times when someone contacted me saying: "Hey, server XX worked really slow yesterday, do you know anything about that?" This is where one could use as much information as possible and I found out disk IO helped me in many occasions.

So I created a template for disk activity monitoring. Currently I'm monitoring everything iostat shows:
- disk throughput
zabbix_transfer.jpg


- disk iops
zabbix_iops.jpg


- disk/queue busy in %:
zabbix_disk_busy.jpg


- number of active transactions
- average transaction service time
- waiting transactions (queue size)

I run iostat for 58s and save output to a file, then Zabbix agent grabs the data out of it and sends it to server. I have to create triggers as well, but I'm still thinking what threshold levels should be for triggering it.
All hard drives are auto discovered, so you just load the template into zabbix and it scans the server and add devices.

Scrub is just running on my 4x2TB raidz2 pool, that is why there is so many iops...

I still have some cleaning in template to do, then I will post a short howto here.

I've also made a SMART template, which can grep any smart status and logs it(and if there's a need, graph it).

Zabbix has it's good and bad sides. One of the good things is that you can set custom update time and amount of time it saves the data and amount of time it saves the trends(daily average). I currently have it set to 14 days, so that means I have data for every minute for the last 14 days. That's one of the benefits, compared to solutions that uses RRDs(such as Cacti, Observium(which is more userfriendly)). There's also no need to graph anything, you can call up graphs on demand and change the scale as well.

Anyone else using zabbix to monitor servers?

MAtej
 
Last edited:
I'm looking to setup a stronger home file server, and I think I'm going with Napp-it/OpenIndiana. I have a few questions, though :)

I'd like to reuse my x58/i7 920 setup that is doing very little at the moment. It currently has 6GB DDR3 in it, and that's always expandable (max 24GB). It will be servicing several smallish virtual machines (light webservices and other servers) running on ESXi 5.5 on other hardware via NFS share as that is what my current setup is (on much worse hardware :( )

I'm considering an order of two WD Red 3TB disks to setup in a RaidZ1 with additions to be installed later when budget allows. I also have a spare 40GB Intel SSD I would like to use for L2ARC if this is feasible.

My main question is if the x58/i7 platform is compatible and powerful enough for my use case, and if I should go with more disks or more RAM to begin with?

Thanks
 
You know you can add disk per disk to raid in ZFS? You can only add volumes(another mirror, another raidz,...).

As far as memory goes, you are ok for start, but 8GB might not be a bad idea:)

As for hard drives, I would go SSD mirror for VMs and spindle mirror for storage volumes. I currently have a raid1 for my VMs and I'm not impressed. So I'm waiting for some income to buy 2 SSDs.

If you want to go with spindles, make at least RAID10.

Matej
 
Intel X58 based platforms should work. Only ECC is not possible (Desirably but not essential)
If there is a Realtek nic onboard I woul consider to add a cheap Intel nic as Realtek is quite slow.

You can use the X58 for a barebone storage system. It should work as napp-in-one as well

I would avoid the Reds and use 7200rpm Hitachis. They are faster and more reliable.
I would start with a mirror of disks for backup and general use storage and a mirror of (smaller) SSD's to hold the VMs as they are much faster especially regarding IOPS.
You can use the 40GB as L2ARC. Much faster is adding RAM for ARC. I would add as much RAM as possible. You can use arcstat to decide if a L2ARC is needed.
I would prefer OmniOS stable over OI dev
 
Gea:
What do I have to look at in arcstat to see whether I need L2ARC or not?
 
My current stats are:
Code:
ARC Efficency:
	 Cache Access Total:        	 1688348949
	 Cache Hit Ratio:      99%	 1686779487   	[Defined State for buffer]
	 Cache Miss Ratio:      0%	 1569462   	[Undefined State for Buffer]
	 REAL Hit Ratio:       96%	 1623125983   	[MRU/MFU Hits Only]

I guess I'm good for now?
 
You know you can add disk per disk to raid in ZFS? You can only add volumes(another mirror, another raidz,...).

As far as memory goes, you are ok for start, but 8GB might not be a bad idea:)

As for hard drives, I would go SSD mirror for VMs and spindle mirror for storage volumes. I currently have a raid1 for my VMs and I'm not impressed. So I'm waiting for some income to buy 2 SSDs.

If you want to go with spindles, make at least RAID10.

Matej

Thanks, I am familiarizing myself with the RaidZ standards. My plan is to purchase two disks now and put them in a mirror, then when I buy two more later on, I will backup the data to some WD Greens I have around here, drop the RAID and create a RAID10 with the four disks and restore the data.

Intel X58 based platforms should work. Only ECC is not possible (Desirably but not essential)
If there is a Realtek nic onboard I woul consider to add a cheap Intel nic as Realtek is quite slow.

You can use the X58 for a barebone storage system. It should work as napp-in-one as well

I would avoid the Reds and use 7200rpm Hitachis. They are faster and more reliable.
I would start with a mirror of disks for backup and general use storage and a mirror of (smaller) SSD's to hold the VMs as they are much faster especially regarding IOPS.
You can use the 40GB as L2ARC. Much faster is adding RAM for ARC. I would add as much RAM as possible. You can use arcstat to decide if a L2ARC is needed.
I would prefer OmniOS stable over OI dev

Thanks _Gea. Would you, in your personal opinion, say that ECC is necessary or a 'nice-to-have' feature? Unfortunately it does have Realtek onboard, I will be adding an Intel NIC. Would I be able to use the Realteks for administrative use (non-disk access)?

I assume the Hitachi's are the Ultrastars? Any place you recommend to purchase from (I assume Newegg, Amazon, etc.)

My main use case now is an 80GB SSD containing all of the VM host OS disks on its own discrete NFS mount, and the server content (/var/www/html for instance) is on a larger, slower disk that is also NFS mounted, both on their own switch (a SAN, if you will). I'd like to replace the larger, slower disks with a faster ZFS configuration for performance and reliability.

Thanks again!
 
ZFS can detect all errors in the chain disk-subsystem -> disk-driver -> disk controller -> cabling -> disk. This covers nearly all possible reasons for a data corruption.

RAM is then the most probable source of undetected errors.
If data is imprortant, yes I would call ECC necessary as the premium is quite small. But even without ECC, ZFS can detect and repair more errors than say ext4 or ntfs.

Realtek is ok for management and if performance is not as important

Yes, HGST/Hitachi Ultrastar. I cannot comment about where to buy from (I am from Germany)
 
Some months ago I build a little ZFS box with a couple of IODrives, 9 SATA drives and 384GB RAM. Initially it was performing ok, but something was off, I had various crashed on ESX, the whole Infiniband stack seemed locked and the only solution would be to reboot the OmniOS box.

Anyways many hours of troubleshooting later I think the road ahead looks a bit better, I upgraded all the IB cards to latest firmware (unsupported by HP, but works just fine). I also think that SATA disks on a SAS expanding was just about the worst idea I have had this millennium. I also downgraded to 192GB RAM since there seems to be some pointers of an existing bug in either ZFS or Solaris/OmniOS/IO whatever. Could never find really reliable information on this, but I dont see any ghost hits on the cache so 192GB will have to do for now.

Now I am in the process of chasing better performance and I hope you guys can help.

First, since this is meant for production I have forced sync on the pool and disabled write back cache on each LUN.

I have two SLC based IO Accelerators mirrored as SLOG (they show as 2x160GB) device and two MLC basd IO Acccelerators as L2ARC (they show as 4x300GB).

In the pool I have 10 Seagate 4GB SAS disks in 5 2-way mirrored vdevs.

I see on average about 550MB/Sec Writes and 1.5GB/Sec Reads... so far so good, its fairly stable... if I stick to one SLOG I still get the same performance, which is expected since the two were mirrored.
If I add another SLOG, instead of a mirror I see about 650-700 MB/Sec Writes, I would expect 2x550MB since each LOG device can do that.

What can the problem be? The underlying disks (ST4000NM0023) should be able to take in about 175MB/Sec each...

I played around with zfs_txg_timeout but the default of 5 seems to be a good setting.

These drives have 128MB Cache each, should I change these settings:
zfs_vdev_cache_max = 0x4000
zfs_vdev_cache_size = 0x0

Seems stupid to have a max cache of 16MB when the actual cache is 128MB...

I also see some dips in performance... doesn't happen all the time, but occasionally. Any idea how to get rid of those?

Tests with 2-way Mirror LOG:
eiDvoWA.png

Zkb9Hpa.png

IPgE32J.png


Test with 2 LOGs - why the dips in the first run?
bVGzzN0.png

wbKeiBl.png


br,
Paniolos
 
Hard to comment any details, but any reason to use more than 128 GB RAM?
Above this, you may expect problems. (May be fixed in new NexentaStor4/ next Illumos)
http://nex7.blogspot.de/2013/03/readme1st.html

I would reduce to 128GB

Why do you use slow 4TB disks if your main concern seems performance.
A SSD only config build from Enterprise SSDs like Intel 3500/3700 without
a ZIL (or a VERY fast one) seems a config that can offer more IOPS, eventually
paired with a large but slower spindle based pool.

I would compare a SSD only pool

If you need sync write and performance, use the fastest as possible ZIL
(like a ZeusRAM). If you do not use a very good and current SLC as ZIL this may slow down.

Compare sync=always to sync=disabled performance

Have you checked arcstat if a L2ARC is helpful. Maybe its faster without a L2ARC if you
have that many RAM.

Tuning ZFS, cache or TCP parameters is the last step.
 
I've decided to toss in a 1366 quad-core Xeon in the x58 and use 4x4GB ECC DDR3, probably going with those Ultrastars (going for 2 to begin with in a RaidZ1, then later will dump and recreate into a RaidZ2 or Z3 when I obtain more disks) It's a very modest setup here at the home office, I think this should be sufficient.

When reading the documentation (http://www.napp-it.org/doc/downloads/napp-it.pdf) it says NFS V4 is also supported and allows user based permissions and ACL (not used with ESXi). Does that mean that an ESXi host ignores/is incompatible with user based permissions/ACLs or that Napp-it NFS4 and ESXi are incompatible entirely and I should use NFS3 instead? I am using ESXi 5.5.0

Thanks for the advice
 
Last edited:
- With two disks, create a mirror
- ESXi "talks" NFS3 to the storage server what means that you can restrict access only based on IP and uid without authentication.

Usually you have a secure network between and set the NFS share to fully open (just set NFS to ON with a ACL like everyone@=modify on the files/folders). Best set your NFS shared filesystem to aclmode = discard (ignore a chmod client command) and aclinherit to passthrough
 
Is there a way to check disk temperature and control the fan speed from napp-it? (like speedfan on windows)
Because i mounted my zfs box and after a while the hdds were so hot i can barely touch them, but the fans keeps running at low speed.
(case: xcase RM424pro, hdds: 24*5K4000, os: omnios)
 
Last edited:
disks -> smartinfo shows drive temp on my system. I don't think there is anywhere to control fans.
 
Yeah i've seen this, but it says smartmontools not installed.
I tried pasting the command listed here but it didn't worked.
The fans are PWM, i don''t know how they are handled by omnios
 
The current napp-it wget installer compiles smartmontools 6.2 on OI/OmniOS
(OmniOS is the preferred platform, currently 151008)

Since two days they have a new stable 151010 with improved NFS and ZFS
http://omnios.omniti.com/wiki.php/ReleaseNotes

(have not yet found the time to install).
If anyone does the update, please report results
 
I am running OmniOS r151008 and Napp-It v0.9e-1_preview. I've had a scrub job setup for a while now that never runs. I have to run it manually. I've tried re-creating the job, but have the same issue where it refuses to run. The job uses the "First Sunday of the Month" @ 23:00 hours. It was supposed to run on the 4th, but didn't. What can I provide to troubleshoot this issue? Thanks!
 
I would suggest to update the e1 preview to 0.9e1 latest or 0.9f1 preview via menu "about update"
 
Yeah i've seen this, but it says smartmontools not installed.
I tried pasting the command listed here but it didn't worked.
The fans are PWM, i don''t know how they are handled by omnios

Since fans are PWM controlled, where is the temp. sensor located? Is it controlled by motherboard or does it has it's own sensor?
In case it has its own sensor, it might be damaged?
Can you somehow force PWM disable?
Try contacting XCase if they can help you out

Matej
 
Back
Top