OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

What command?
- OS command ?
- vmstat 1 2 (used in menu system - statistics -memory or via CLI)
- arcstat.pl (used in menu system- statistics - arc)

In doubt, I would trust the CLI commands like vmstat.

ps
During usage, nearly all otherwise not used RAM is assigned as read cache. Maybe this is excluded by your command. It is quite normal for Solaris based systems to have only minimal free RAM due to this behaviour as this gives you a value for your RAM: performance.



Hi Gea, I didn't run a specific command, I used the OI version of Task manager (forget what it's called- system something or something).

it quite clearly shows that only 360mb-ish of ram is used but Napp-it is showing almost complete memory useage - i will try the other commands tonight to double check
 
Weird question... Is Nexenta just over? I mean, I've been a longtime user/proponent, but I think I may need to move to another platform for my storage arrays. I'm concerned that the company is unhealthy. I've deployed ~25 Nexenta-based ZFS solutions to clients over the years, and they've been stable. But trying to install on current-generation hardware fails because the product has not been updated. Disappointing.

My primary filer is an HP ProLiant DL380 G7 with multiple D2700 and D2600 enclosures... The challenge is that I have multiple cascaded JBOD enclosures, DDRdrives (ZIL), and MPxIO to all of the dual-port SAS drives throughout... Are any of the alternative solutions (OI, OmniOS, ZFSonLinux) capable of working with the setup above?
 
There was a thread on this over on servethehome. They have slipped several times for 4.0 release, people keep leaving, 3 CEOs in one year, etc etc etc...
 
Link me!

I know they've been through some changes. But it definitely seems time to jump ship.
 
Gea, and anyone else who would care to answer:

I'm building a ZFS server, trying to make it as fast, quiet and cool as possible. I'm experimenting with 8 Crucial m500 960GB SSDs that I borrowed temporarily from work.

My main question regards TRIM support for these drives. Correct me if I'm wrong, but I don't believe any of the Illumos/Solaris OSs have TRIM support. My main question is, should I be concerned about this? In other words, will performance substantially drop off over time? My usage pattern will be primarily read-heavy, which I think would help. Although, the plan is to periodically dump older stuff off the SSD pool onto a much larger HDD pool, so I suppose the space on the SSD array will get habitually reused.

FreeBSD 10.0 (and possibly 9.2) does offer TRIM support for SSDs in a ZFS pool. My first experiment was (and is) with that, and so far so good. But I've always used Illumos in the past and I wonder if that is an option in this case.

By the way, doing a simple local dd if=/dev/zero test to the array as described in this post is yielding over 2GB/sec, which I was pretty impressed with. The NIC is an Intel 10GbE (x540-T1), BTW, but I have not tested throughput over the network yet (working on configuring Samba at this point).

Semi-unrelated but open for comments: What is the general consensus regarding running those 8 SSDs in a single-parity (RAIDZ1) config? My thoughts were that they are relatively small drives, reading is not hard on SSDs, and SSDs have a high MTBF so the risk of RAIDZ1 seemed low, and therefore I opted to get an additional 960GB of usable space vs the additional protection of RAIDZ2. Does that seem like a good choice in this case? I wouldn't do it with 4TB HDDs, but for 1TB SSDs it seemed reasonable to me.

Thanks to all.
 
If you have a read-heavy workload and they are modern SSDs, I wouldn't worry about the absence of TRIM. Failing that, you could under-provision them by say 10% to give the garbage collector slack to work with?
 
Gea, and anyone else who would care to answer:

I'm building a ZFS server, trying to make it as fast, quiet and cool as possible. I'm experimenting with 8 Crucial m500 960GB SSDs that I borrowed temporarily from work.

My main question regards TRIM support for these drives. Correct me if I'm wrong, but I don't believe any of the Illumos/Solaris OSs have TRIM support. My main question is, should I be concerned about this? In other words, will performance substantially drop off over time? My usage pattern will be primarily read-heavy, which I think would help. Although, the plan is to periodically dump older stuff off the SSD pool onto a much larger HDD pool, so I suppose the space on the SSD array will get habitually reused.

FreeBSD 10.0 (and possibly 9.2) does offer TRIM support for SSDs in a ZFS pool. My first experiment was (and is) with that, and so far so good. But I've always used Illumos in the past and I wonder if that is an option in this case.

By the way, doing a simple local dd if=/dev/zero test to the array as described in this post is yielding over 2GB/sec, which I was pretty impressed with. The NIC is an Intel 10GbE (x540-T1), BTW, but I have not tested throughput over the network yet (working on configuring Samba at this point).

Semi-unrelated but open for comments: What is the general consensus regarding running those 8 SSDs in a single-parity (RAIDZ1) config? My thoughts were that they are relatively small drives, reading is not hard on SSDs, and SSDs have a high MTBF so the risk of RAIDZ1 seemed low, and therefore I opted to get an additional 960GB of usable space vs the additional protection of RAIDZ2. Does that seem like a good choice in this case? I wouldn't do it with 4TB HDDs, but for 1TB SSDs it seemed reasonable to me.

Thanks to all.
sounds like an interesting build. Just out of curiosity what HBA(s) were you planning on using? pcie 3.0? I hear you bottleneck pci lanes fairly quickly with SSDs.
 
No, it's just PCIe 2.0... an LSI 9201-16i. Doesn't seem to be a bottleneck but I guess I really couldn't be sure without trying a 3.0 one. The mainboard is a SuperMicro X9SCM.
 
Can somebody help me?

I just just got an e-mail alert from my Napp-it box with a TON of hard drives listed as degraded on my RAID-Z2.

Here is a screenshot:


I really, really don't want to lose everything. How can so many hard drives be degraded all at the same time?!
 
Well they can't actually be offline. It's a raidz2, and losing 6 drives would kill the pool. I don't recall seeing individual drives alive but 'degraded' before. Trying 'zpool clear tank' and see?
 
I was able to "fmadm repaired" on each of the drives and it now shows as online. Thanks for the help...

I'm a little worried about why this happened though. Hopefully it doesn't happen again...
 
Sounds like a reporting glitch. I've never seen a drive degraded, just a pool. And they can't have really been degraded or having that many would have killed the pool. Glad it works now...
 
I'm not sure it was a glitch because "fmadm faulty" displayed a bunch of degraded drives due to too many checksum errors.

What would have caused so many checksum errors on the different drives?
 
Maybe some noise on the pci bus. I said 'glitch', because I'm not aware how a drive can be degraded. Maybe my memory is fault, but I don't remember seeing a drive flagged as 'degraded'. Then again, maybe I am just remembering wrong. Whatever happened must have been not all at once, or I would think you'd have gotten errors relating to specific files. e.g. one or more drives got checksum errors, but the read worked because it figured it out from the other N drives. Then, one or more other drives had the same thing happen. I'd suspect the disk controller or bus or maybe ram? Kinda scary...
 
What do you guys think about ZFS on Linux ? Is it as fast and reliable as ZFS on OI, Omnios etc ? Are all the ports at the same lavel (FreeBSD/Linux/MAC) ?
 
Last edited:
I used ZoL for a bit. For basic serving up data it works very well. I stopped because there are corner cases it handles poorly (not their fault - bad interactions with udev, etc...) If the HBA doesn't present all the disks quickly enough on boot, the zfs driver loads too soon and you get things like: pool not present, datasets not shared, zvols not visible, etc... I use this for a home/soho setup and was annoyed at having to put hacks in /etc/rc.local to forcibly import/share/restart-iscsi etc... If this doesn't happen for you, go for it, by all means...
 
The fact is that I feel much more comfortable with Linux OSes (especially with Debian/Debian based distros) than with Unix based ones.

However I would use it for home purposes to download and share downloaded data across the home network. How about performance? Does it perform the same as the native version?
 
How about ZFS on FreeBSD, is it more consolidated than the ZOL ? I could use it on Debian kFreeBSD...
 
Actually, then, I would recommend trueos (freebsd based.) It actually has the whole beadm boot environment commands. Very handy if you mess up somewhere...
 
At this point, maybe, the best solution is to stick with OpenIndiana/OmniOS + napp-it. I'll manage torrent with Transmission even though I've figured out that it isn't as fast as uTorrent.
 
Hello!!

Code:
`joomla/plugins/editors/tinymce/jscripts/tiny_mce/plugins/safari/editor_plugin_src.js' -> `/omnios/joomla/plugins/editors/tinymce/jscripts/tiny_mce/plugins/safari/editor_plugin_src.js'
cp: cannot create regular file `/omnios/joomla/plugins/editors/tinymce/jscripts/tiny_mce/plugins/safari/editor_plugin_src.js': Operation not permitted
`joomla/plugins/editors/tinymce/jscripts/tiny_mce/plugins/safari/editor_plugin.js' -> `/omnios/joomla/plugins/editors/tinymce/jscripts/tiny_mce/plugins/safari/editor_plugin.js'
cp: cannot create regular file `/omnios/joomla/plugins/editors/tinymce/jscripts/tiny_mce/plugins/safari/editor_plugin.js': Operation not permitted

Code:
rsync: mkstemp "/omnios/joomla/administrator/components/com_templates/.index.html.gYKDOw" failed: Operation not permitted (1)
rsync: mkstemp "/omnios/joomla/administrator/components/com_templates/.templates.xml.exWYLJ" failed: Operation not permitted (1)
rsync: mkstemp "/omnios/joomla/administrator/components/com_templates/.toolbar.templates.html.php.0bG6WW" failed: Operation not permitted (1)
rsync: mkstemp "/omnios/joomla/administrator/components/com_templates/.toolbar.templates.php.5wyZ89" failed: Operation not permitted (1)
rsync: mkstemp "/omnios/joomla/administrator/components/com_templates/helpers/.index.html.qSFt8m" failed: Operation not permitted (1)

If anyone ever gets into the "Operation not permitted" error with NFS share, just mount the NFS share with noacl option. That way, you will have normal posix "ACL" for NFS share, but ACLs will still be applied to the files, even if you chmod the file/folder.

That way I can set ACL's in windows, use them, but have NFS share without ACLs and all the problems that can bring.

Matej
 
Hi guys, zfs should be faster as the amount of available ram increases. In my VM I have 8 core (4770k) and 10GB of ram available, I've created a stripe of two virtual disks 50GB each.

If I try to read/write files on my smb share I can't get more than 40MB/s, and I've noticed that the used ram is always the same (~350MB). How can I change that? I'm using openindiana.

P.S. on ubuntu server + zol, I can easily saturate the Gb connection and during file transfer ram usage hits up to 5-6GB.
 
Hi guys, zfs should be faster as the amount of available ram increases. In my VM I have 8 core (4770k) and 10GB of ram available, I've created a stripe of two virtual disks 50GB each.

If I try to read/write files on my smb share I can't get more than 40MB/s, and I've noticed that the used ram is always the same (~350MB). How can I change that? I'm using openindiana.

P.S. on ubuntu server + zol, I can easily saturate the Gb connection and during file transfer ram usage hits up to 5-6GB.

Without detailed informations it is hard to give any hints beside the basic
- with ESXi + NFS: disable sync and recheck
- with ESXi and iSCSI: enable write back an recheck
- with iSCSI: try larger blocksize and use volume based iSCSI
- with ESXi and OI: try vmxnet3 vnic
- with ESXi, use hardware pass-through not RDM or ESXi virtual disks

- use Intel nics on OI
- do not use any copy tools on Windows like Taracopy
- compare networks speed vs local performance ex bonny to decide if its a pool or network/protocoll problem
- compare SMB vs iSCSI
 
I'm hitting a strange timeout issue with All-in-one model.

CPU: 2x xeon e5 2620 v2
Main: x9drd-jln4f
Ram: 128G

Controller:
1x m1015, flashed to 9211 IR, omni vmdk is stored here.
1x lsi 2308 onboard, IT mode, pass through to Omni

Disk:
2x seagate sas 300G, connected to m1015, mirrored for omni vmdk
8x wd red 3TB, connected to lsi 2308

IO read/write to 2308 is normal, no issue so far. Data read/write ~ 10TB.

If i add an additional 250GB vmdk to omni, which is stored on m1015; then the m1015 is hang by this command:
zpool create test c2t1d0 (assume 250GB disk is c2t1d0)
dd if=/dev/zero bs=1M count=249000 of=/test/z.txt &

The m1015 will hang up when 20GB => 180GB of written data.
Since boot disk is hang => everything is collapsed.

Does anybody experience this issue?
Regards.
 
Last edited:
Can somebody help me?

I just just got an e-mail alert from my Napp-it box with a TON of hard drives listed as degraded on my RAID-Z2.

I really, really don't want to lose everything. How can so many hard drives be degraded all at the same time?!

Check cables, PSU and memory. I suspect nothing is wrong with the disks. ECC? How old is the PSU? Change cables! :)
 
So I'm so done with OpenIndiana and Napp-it it's scary. Every 6 months or so, OI will stop working and force me to reinstall OI along with Napp-it, and then I have to recreate my virtual machines etc. Well guess what, the 6 month mark is up, and Openindian loads along with Napp-It, I can access my pool for a bit, disks show NO errors and then boom. Can't access my pool, can't access my disks.

Done. Sick and tired of having to rebuild this shit all the time. </end rant>
 
So I'm so done with OpenIndiana and Napp-it it's scary. Every 6 months or so, OI will stop working and force me to reinstall OI along with Napp-it, and then I have to recreate my virtual machines etc. Well guess what, the 6 month mark is up, and Openindian loads along with Napp-It, I can access my pool for a bit, disks show NO errors and then boom. Can't access my pool, can't access my disks.

Done. Sick and tired of having to rebuild this shit all the time. </end rant>

Hmmm. As a counterpoint to your rant, I have been using OI and napp-it in two separate production deployments for nearly two years without any problem with either OI or napp-it. Knock on wood, please. \

I think you will likely find many others with my experience. If you want some real help, you should give a more detailed description of exactly what errors you get other than it "stops working". My guess is that you may have some hardware issues and/or configuration mistakes.

Good luck.
 
Hmmm. As a counterpoint to your rant, I have been using OI and napp-it in two separate production deployments for nearly two years without any problem with either OI or napp-it. Knock on wood, please. \

I think you will likely find many others with my experience. If you want some real help, you should give a more detailed description of exactly what errors you get other than it "stops working". My guess is that you may have some hardware issues and/or configuration mistakes.

Good luck.

Came home today to no shares. When I go to the napp it homepage and click on disks or pools there is nothing there.

Shut everything down, restart as before.

Same thing. ZFS will load, my SAB VM will load. I can connect to the VM briefly then the whole thing stops working again (clicking on disks, pools does nothing).

If I JUST load up ZFS, I can access my shares ok. I think maybe there is something wrong with the SAB VM, so I try to copy the files contained within the VM to a different folder. However, the read speeds are now abysmal.

The only interesting clue so far is the fact that I can only access my shares if I don't load the VM's.
 
Last edited:
Came home today to no shares. When I go to the napp it homepage and click on disks or pools there is nothing there.

Shut everything down, restart as before.

Same thing. ZFS will load, my SAB VM will load. I can connect to the VM briefly then the whole thing stops working again (clicking on disks, pools does nothing).

If I JUST load up ZFS, I can access my shares ok. I think maybe there is something wrong with the SAB VM, so I try to copy the files contained within the VM to a different folder. However, the read speeds are now abysmal.

The only interesting clue so far is the fact that I can only access my shares if I don't load the VM's.

What does "SAB" stand for?
 
Yeah, well, maybe this whole running-your-own-NAS business isn't for you if you can't figure out what exactly is wrong with your setup and your first reflex to a problem is rebooting. Buy a read-made NAS and call support if anything goes wrong.
 
The only known software problem that can cause data corruption and a stalled OS is ESXi 5.5 with an unmodified e1000 vnic. In such a case, you can use ESXi 5.1, a vmxnet3 vnic or modify e1000 settings, see http://napp-it.org/downloads/index.html

My own napp-in-one appliance for ESXi includes vmxnet3 and a modified e1000. You may download and try with this one. It includes OmniOS 1008 stable.

Other reasons are mainly hardware problems (cabling, power, HBA, disks - even a single semi dead disk can block the whole controller). Check systemlog, fmdlog, smart values and check cabling and correct physical placement of your HBA (unplug/ replug). The part. you can and should test is RAM.

In any other case with curious ESXi behaviours I would reinstall ESXi as it is the only part without data checksums on disk so you are not informed about data errors unlike ZFS. If this does not help and you cannot get infos in the logs, you may move your disks to a complete new hardware with new cabling. If it works, it is a problem of PSU, Mainboard (CPU,RAM), HBA or cabling.
 
Last edited:
Yeah, well, maybe this whole running-your-own-NAS business isn't for you if you can't figure out what exactly is wrong with your setup and your first reflex to a problem is rebooting. Buy a read-made NAS and call support if anything goes wrong.

Thanks for the post. However stop making assumptions that my first reflex was to rebbot. Have a wonderful day.
 
Thanks for the post. However stop making assumptions that my first reflex was to rebbot. Have a wonderful day.

I'm sorry, maybe work on the impression you're giving, then. I haven't seen any tangible attempt at troubleshooting and instead a lot of handwaving, whining and FUD. What other impression could one get from this?
 
I'm sorry, maybe work on the impression you're giving, then. I haven't seen any tangible attempt at troubleshooting and instead a lot of handwaving, whining and FUD. What other impression could one get from this?

Work on your comprehension.
 
The only known software problem that can cause data corruption and a stalled OS is ESXi 5.5 with an unmodified e1000 vnic. In such a case, you can use ESXi 5.1, a vmxnet3 vnic or modify e1000 settings, see http://napp-it.org/downloads/index.html

My own napp-in-one appliance for ESXi includes vmxnet3 and a modified e1000. You may download and try with this one. It includes OmniOS 1008 stable.

Other reasons are mainly hardware problems (cabling, power, HBA, disks - even a single semi dead disk can block the whole controller). Check systemlog, fmdlog, smart values and check cabling and correct physical placement of your HBA (unplug/ replug). The part. you can and should test is RAM.

In any other case with curious ESXi behaviours I would reinstall ESXi as it is the only part without data checksums on disk so you are not informed about data errors unlike ZFS. If this does not help and you cannot get infos in the logs, you may move your disks to a complete new hardware with new cabling. If it works, it is a problem of PSU, Mainboard (CPU,RAM), HBA or cabling.

It's definately the one VM bringing the system down (by system I mean OI and then my other VM's). I was getting a can't write to NVRAM error in VMWare so I tried deleting the file and then powering up the VM. The VM will power up (Win 7) and run for 1-5 minutes, and then it goes down and brings OI down with it. OI is still running and responsive, I just can't access disks/pools.
 
Back
Top