OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

_Gea I've been using Solaris Express 11 with napp-it for a few weeks now and i'm very impressed so far - thank you for the work you've put into this
 
How important is ECC in a ZFS pass through / ESXi build? I'm considering between a Xeon build vs i7-2600 /w Intel Q67 motherboard. Is the ECC worth the extra $200+?
 
How important is ECC in a ZFS pass through / ESXi build? I'm considering between a Xeon build vs i7-2600 /w Intel Q67 motherboard. Is the ECC worth the extra $200+?

for the home? not really. for a business production environment? yes.
 
Intel 82579LM for the nic - is that officially supported by ESXi or do you still have to do unsupported driver hax?


http://www.newegg.com/Product/Product.aspx?Item=N82E16813182262
(note the +F, has 2 82574L controllers)

http://www.newegg.com/Product/Product.aspx?Item=N82E16819115083
E3-1230 is cheaper than the 2600

you can use the price difference between the 2600 and e3-1230 to bump the ram up to ECC

I think the price will be no more than 40-50 bucks more (mobo price difference), and you are getting dual nics and probably a more reliable overall setup. It's certainly going to be one that more people have used.
 
Intel 82579LM for the nic - is that officially supported by ESXi or do you still have to do unsupported driver hax?


http://www.newegg.com/Product/Product.aspx?Item=N82E16813182262
(note the +F, has 2 82574L controllers)

http://www.newegg.com/Product/Product.aspx?Item=N82E16819115083
E3-1230 is cheaper than the 2600

you can use the price difference between the 2600 and e3-1230 to bump the ram up to ECC

I think the price will be no more than 40-50 bucks more (mobo price difference), and you are getting dual nics and probably a more reliable overall setup. It's certainly going to be one that more people have used.

i would also prefer the Supermicro with a server chipset
- 2 x supported nics (ESXi)
- ipmi remote control
- 3 x fast pci-e slots >= 4x
- 6 x sata
- ecc support

much more worth than the 50$ premium
I am quite sure you need or like all of the extras
 
Last edited:
Of course you are both correct. It's silly to scrimp when the cost is so close. The server board will be better for a number of reasons. I was trying to add some extra complexity to the build by using the board as a desktop pc for a little while until the Intel X79 launches.. Long term though, a Supermicro build makes more sense.

Thanks guys! Oh and special thanks to _Gea. For years I've wanted a single ZFS box that also could host VMs. I have drawn up many hacked configurations, but your method is by far most excellent... and I can't wait to have an all in one up and serving.
 
If in a Pure mirroring setup, does ram size and with or without ecc matters?

more ram = more cache (ARC) for zfs, so often accessed data will be cached and thus will be faster

ecc will reduce errors from occurring in memory
 
I was trying to add some extra complexity to the build by using the board as a desktop pc for a little while until the Intel X79 launches..

No reason you can't run it as your desktop (I am actually running that same setup right now as my primary desktop) - just a a video card and you are good to go.
 
Gea, I notice in the napp-it instructions on the all in one setup, you say not to use esxi 5.0 because of the 8 GB ram limitation, but I thought vmware changed to 32 GB as the limit.

32 GB seems to be adequate for lots of builds, though I agree 4.1 was better in this aspect.
 
Gea, I notice in the napp-it instructions on the all in one setup, you say not to use esxi 5.0 because of the 8 GB ram limitation, but I thought vmware changed to 32 GB as the limit.

32 GB seems to be adequate for lots of builds, though I agree 4.1 was better in this aspect.

He changed the guide before the updated announcement of 8GB --> 32GB limitation was changed. I have had an ESXi 5 install for months now and it works beautifully and I would recommend you start with that.
 
He changed the guide before the updated announcement of 8GB --> 32GB limitation was changed. I have had an ESXi 5 install for months now and it works beautifully and I would recommend you start with that.

Vmware changed the limit from 8 to 32 GB per machine lately.
I have to update my miniHowto...
 
What's the best way to benchmark disk i/o performance? (for ESXi NFS/iSCSI datastores, CIFS/SMB transfers, etc)

I've been using CrystalDiskMark but the results are really different every time I run it.

Also, I've tested copying a file using CIFS/SMB and I was getting 100 MB/s (ie. max gigabit) but CrystalDiskMark only shows ~50 MB/s sequential read performance so something is very wrong...
 
maybe for you, but i care about my data so i use ecc

when have you ever seen ECC 'save' data? i will argue the home environment never stresses a fileserver hard enough that ECC will ever 'save' anything. with the drive sizes these days you're going to run into bit rot before ECC is going to 'save' the data from transferring or streaming that video file.
 
when have you ever seen ECC 'save' data? i will argue the home environment never stresses a fileserver hard enough that ECC will ever 'save' anything. with the drive sizes these days you're going to run into bit rot before ECC is going to 'save' the data from transferring or streaming that video file.

You're probably right madrebel. However, the ram price is so close nowadays that why wouldn't you opt for ECC? It's well worth the extra $$ for peace of mind in my book. Especially when dealing with a storage server.
 
when have you ever seen ECC 'save' data? i will argue the home environment never stresses a fileserver hard enough that ECC will ever 'save' anything. with the drive sizes these days you're going to run into bit rot before ECC is going to 'save' the data from transferring or streaming that video file.

Obviously a loaded question - when you have an error due to non-ecc memory it's not really going to be diagnosable - and likewise if you have ecc there's not really any sort of performance counter that i'm aware of that lists errors corrected.

Really the price different is so miniscule why not go for it. Are you going to crash and burn without it? Probably not.

As for bit rot, that's why you are in a solaris thread. Problem solved.
 
Obviously a loaded question - when you have an error due to non-ecc memory it's not really going to be diagnosable - and likewise if you have ecc there's not really any sort of performance counter that i'm aware of that lists errors corrected.

Really the price different is so miniscule why not go for it. Are you going to crash and burn without it? Probably not.

As for bit rot, that's why you are in a solaris thread. Problem solved.

on some platforms you should be able to get the number of memory error corrections.

see: http://bluesmoke.sourceforge.net/
 
I'm just doing a fresh Napp-It VM build, going through a shakedown on the hardware and noticed that Bonnie++ in Pools/Benchmarks won't launch. dd bench works

ESXi 4.1
2 vCPU, 12 GB ram
0.500s nightly Jul.03.2011
SunOS vmNAS3 5.11 oi_148 i86pc i386 i86pc

This is a 2 x LSI 2008 IT setup: 5 x 3TB mirror pairs, 2 x 2 x 160GB SSD write cache, 2 x 160GB SSD read cache : 13.6TB

write 10.24 GB via dd, please wait...
time dd if=/dev/zero of=/stresstest/dd.tst bs=1024000 count=10000

10000+0 records in
10000+0 records out

real 15.0
user 0.0
sys 8.4

10.24 GB in 15s = 682.67 MB/s Write

read 10.24 GB via dd, please wait...
time dd if=/stresstest/dd.tst of=/dev/null bs=1024000

10000+0 records in
10000+0 records out

real 10.3
user 0.0
sys 3.7

10.24 GB in 10.3s = 994.17 MB/s Read


The board is a Supermicro with 2 x L5606 and 48GB of RAM, so I'm staying away from ESXi 5.0, unless I can splurge for the base ESXi license and get 2 more of this setup.

I may investigate OI 151a.

Any hints would be appreciated.
 
I'm just doing a fresh Napp-It VM build, going through a shakedown on the hardware and noticed that Bonnie++ in Pools/Benchmarks won't launch. dd bench works.

You may run bonnie++ from CLI to see error messages
You may rerun the napp-it installer to check if bonnie is compiling correctly

But bonnie values are quite similar to dd
and i would update to OI 151a first
 
I'm having an issue with the auto script. I have the OI server plugged into a monitor and the screen is telling me to check the auto_error.log. It has a bunch of entries that say this. The server is running Barebones install of OI 151a and Napp-it 0.500s

Code:
Use of uninitialized value in subroutine entry at /usr/perl5/5.10.0/lib/i86pc-solaris-64int/DynaLoader.pm line 226.
 
I have a couple last issues I've been trying to figure out before my server build is finished, maybe some Solaris/OpenIndiana gurus can help...they are both power related.

1) How to get drive spindown working? I've set the device threshold in power.conf via napp-it, and disabled fmd because I read that it could be preventing spindown, but alas no luck. I did see on page 2 of this thread a script that someone created for FreeBSD:
OK, here it is in case anyone wants it, modify as you see fit.

usage:

zpool-spindown.sh poolname

==============================
#!/usr/local/bin/bash

#
# zpool-spindown.sh
#

ZPOOL="$1"

if [ -z "$ZPOOL" ]
then
echo "zpool name required"
exit 2
fi

PATH=/usr/local/bin:/bin:/usr/bin:/usr/sbin:/usr/local/sbin:/bin:/sbin
export PATH

# Cleanup any tmp file if present
if [ -f /tmp/zpool.iostat ]
then
rm -f /tmp/zpool.iostat
fi

# Name of samba share to check if mounted
SMBSHARE="media"

# Get drives for pool
drives=`zpool status $ZPOOL | egrep "da[0123456789]" | awk '{print $1}' | tr '\n' ' '`
firstdrive=`echo "$drives" | awk '{print $1}'`

# Activity checks
smbactive=`smbstatus -S | grep -A 6 "Connected at" | grep $SMBSHARE | wc -l | awk '{print $NF}'`
scrubrunning=`zpool status $ZPOOL | egrep "scrub in progress|resilver in progress" | wc -l | awk '{print $NF}'`
spundown=`smartctl -n standby -H /dev/$firstdrive | tail -1 | grep "STANDBY" | wc -l | awk '{print $NF}'`

if [ -f /tmp/locate.running ]
then
echo "Locate running...Aborting spindown!"
exit 3
elif [ $smbactive -gt 0 ]
then
echo "Samba share is mounted...Aborting spindown"
exit 3
elif [ $scrubrunning -eq 1 ]
then
echo "Scrub/resilver is running...Aborting spindown"
exit 3
elif [ $spundown -eq 1 ]
then
echo "Spundown already...Aborting spindown"
exit 3
fi

# Longer IO Activity check - only perform if got past above
zpool iostat $ZPOOL 30 2 | tail -1 > /tmp/zpool.iostat
reading=`cat /tmp/zpool.iostat | awk '{print $(NF-1)}' | awk -F\. '{print $1}' | sed -e 's/K//g' | sed -e 's/M//g'`
writing=`cat /tmp/zpool.iostat | awk '{print $NF}' | awk -F\. '{print $1}' | sed -e 's/K//g' | sed -e 's/M//g'`
rm -f /tmp/zpool.iostat

if [ $reading -gt 0 ]
then
echo "Pool shows IO activity...Aborting spindown"
exit 3
elif [ $writing -gt 0 ]
then
echo "Pool shows IO activity...Aborting spindown"
exit 3
fi

drives=($drives)
type=""

driveop () {
drive=$1
# Need to issue differnt command to ada vs da devices!!!
type=`echo $drive | cut -c 1`
if [ $type = "d" ]
then
camcontrol stop $drive
elif [ $type = "a" ]
then
camcontrol standby $drive
fi
return
}

drives_count=${#drives[@]}
index=0

while [ "$index" -lt "$drives_count" ]
do
driveop ${drives[$index]}
printf "Spindown Drive %s\n" ${drives[$index]}
let "index = $index + 1"
done
===============================

Can this be adapted for Solaris? The script aborts at the command 'smbstatus'.

2) Is it possible to get a Cyberpower UPS interfacing w/ Solaris via USB? I've tried various things with apcupsd but I'm a bit lost on that since I'm a Unix newbie.

ps. This thread rocks, so much good info, thanks Gea!!
 
I'm having an issue with the auto script. I have the OI server plugged into a monitor and the screen is telling me to check the auto_error.log. It has a bunch of entries that say this. The server is running Barebones install of OI 151a and Napp-it 0.500s

Code:
Use of uninitialized value in subroutine entry at /usr/perl5/5.10.0/lib/i86pc-solaris-64int/DynaLoader.pm line 226.

not a problem, only a warning due to use strict in perl
(fixed in next release)
 
I have a couple last issues I've been trying to figure out before my server build is finished, maybe some Solaris/OpenIndiana gurus can help...they are both power related.

1) How to get drive spindown working? I've set the device threshold in power.conf via napp-it, and disabled fmd because I read that it could be preventing spindown, but alas no luck. I did see on page 2 of this thread a script that someone created for FreeBSD:

Can this be adapted for Solaris? The script aborts at the command 'smbstatus'.

2) Is it possible to get a Cyberpower UPS interfacing w/ Solaris via USB? I've tried various things with apcupsd but I'm a bit lost on that since I'm a Unix newbie.

ps. This thread rocks, so much good info, thanks Gea!!

BSD is different, you cannot use the script.
try to set settings in power.conf from default to enable

maybee this helps also
http://constantin.glez.de/blog/2010/03/opensolaris-home-server-scripting-2-setting-power-management
 
Last edited:
I have a couple last issues I've been trying to figure out before my server build is finished, maybe some Solaris/OpenIndiana gurus can help...they are both power related.

1) How to get drive spindown working? I've set the device threshold in power.conf via napp-it, and disabled fmd because I read that it could be preventing spindown, but alas no luck. I did see on page 2 of this thread a script that someone created for FreeBSD:

Can this be adapted for Solaris? The script aborts at the command 'smbstatus'.

2) Is it possible to get a Cyberpower UPS interfacing w/ Solaris via USB? I've tried various things with apcupsd but I'm a bit lost on that since I'm a Unix newbie.

ps. This thread rocks, so much good info, thanks Gea!!


Hey Man,
First Power management
I fought this long and hard. Finally made a break through reading on nexenta. Now I do not know which of these three things I changed fixed it, because I haven't removed the changes I made.
1st I disabled the fmd and that did not work
2nd I changed autopm from default to enabled and that did not work
3rd I copied the entire path to each disk for each device threshold
The last option worked perfectly. Word of warning though, it will take a while for each disk to spin back up. You can get a list of disk by typing format in the terminal. Copy the listings like you see below. I never did revert steps 1 and 2 so I don't know if they also helped.

Now UPS
Okay so I installed apcupsd without a problem except it would not detect my TrippLIte usb UPS. I know it worked in Ubuntu because i tried it. Never could get it to work so I resurrected my apc and it works perfectly. Buy a APC and don't look back. I have spent countless hours scouring the internet without prevailing.

Here's my power.conf

Code:
#
# Copyright 1996-2002 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#
#pragma ident   "@(#)power.conf 2.1     02/03/04 SMI"
#
# Power Management Configuration File
#

device-dependency-property removable-media /dev/fb
autopm enable
autoS3                  default
# cpu-threshold           1s
# Auto-Shutdown         Idle(min)       Start/Finish(hh:mm)     Behavior
autoshutdown            30              9:00 9:00               noshutdown
cpupm  enable
cpu-threshold 300s 
#device-thresholds         /dev/dsk/c5t3d0     2m 
#device-thresholds         /dev/dsk/c5t4d0     2m 
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@0,0 10m
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@1,0 10m 
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@2,0 10m 
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@3,0 10m
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@4,0 10m
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@5,0 10m 
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@6,0 10m 
device-thresholds /pci@0,0/pci8086,3b42@1c/pci1000,3140@0/sd@7,0 10m
 
I've been reading the comments here and I have to say its an amazing treasure of useful information. Thank you.

I'm about ready to build an OI ZFS+nappit, Linux+RAID just isn't cutting it any more.

There are a few things that I'm hoping someone can clear up for me.

The ZFS Best Practices says the maximum size for a separate ZIL device should be half your available ram or elsewhere says 10x (two 5s transaction groups) the max write throughput of your SSD. Wouldn't a larger drive aid longevity due to wear levelling? The only reason I ask is that a lot of people seem to be suggesting the Intel 20GB drives for small SSDs but where I live they're around the same price as some of the newer & faster 60GB Sandforce drives.

Do you retire old drives by swapping them with larger/faster equivalents (with resilvering giving you extra space once an entire vdev is replaced)?

When adding vdevs to a pool, if you cant keep vdevs the same size is it better to match total size or number of disks?
So if you already had some 10+2 2TB (20TB usable) Raid-Z2 vdevs would it be better to add 7+2 3TB (21TB usable) arrays or 20+2 3TB (30TB usable) arrays?

If you wanted to have some redundancy at all levels you'd need two OS drives mirrored, two zfs root pool drives mirrored and then your actual storage pools?
 
1.
The Intel SSD 311 Series (SLC) offers high performance and longer
endurance than Multi-Level Cell (MLC). Also writes are small so sequential
performance is not important. IOP/s is important.

I would prefer SLC for a Log device

2.
After say five years of use, i expect much higher failure rates of disks
you can wait until it happens or you can replace the disks manually which
also increases capacity.

But i would replace the whole pool at once
and use the old for backups

3.
I would not care about disk numbers but redundancy.
All vdevs should have the same redundancy/ security level
(if one vdev fails, the pool is lost entirely)

From a performance view, it is best to have as much vdevs as possible.
If you like highest capacity, one large vdev is best but then you should use raid-z3
with 20 disks (you have a very long resilver time in case of a failure)

4.
It depends. For a pure NAS use, there is nothing important on the os disk.
ReInstallation is easy and you need only 30 min until import and reshare the pool.
Optionally copy/note settings to your data pool
 
Last edited:
Hi gremlin thanks for responding, that is good info! Unfortunately I'm still not able to get spindown to work for any of the drives connected to my M1015, but the drives connected to SATA ports on the motherboard DO spindown...so I guess I need to figure out what it is about the card that is preventing spindown.

Do you know of an easy way in Solaris to check what is polling the hard drives, so that I might be able to figure out what is preventing it?
 
I didn't see a way to do it in the napp-it gui so I decided to hardcore an ip in Nexenta because every time i reboot the server it pulls a new ip from dhcp. I bing around and found some instructions. I edit "vim /etc/nwam/llp" then set "xnf0 static xx.xxx.xxx.xxx from dhcp" if i restart services "svcadm restart svc:/network/physical:nwam" i get the hard coded ip but if i reboot i get a dhcp ip.

To fix the static changing on reboot, on the openindiana irc board someone suggested:

svcadm disable network/physical:nwam
svcadm enable network/physical:default
ipadm create-addr -T static -a local=xx.xxx.xxx.xxx xnf0/v4

The last command failed because ipadm is not available on Nexenta Core. Now i have no IP when i boot.

Can anyone suggest a way for me to get myself out of this hole that i've dug?

-Flash
 
So if you already had some 10+2 2TB (20TB usable) Raid-Z2 vdevs would it be better to add 7+2 3TB (21TB usable) arrays or 20+2 3TB (30TB usable) arrays?
Never use a vdev larger than, say ~12ish disks. You will get extremely bad IOPS performance. And when you resilver it will take days.

I am going to buy a norco 4224. I am going to use two 12 disks in a raidz3. Better would be to have three 8 disks in raidz2 - but I dont care too much about IOPS.

One vdev will give you the same IOPS as one disk. Say you have 20 disks in raidz3 - then you have 20 disks acting as a single disk - which means very bad IOPS.
 
I have my Nexenta Community edition 3.5 and its been working great, but I think the OS drive is dieing. Is there a way to backup the entire config of the os, and re-install on a new drive and reload that config. I would hate to have to re-set up all my share settints etc...
 
I didn't see a way to do it in the napp-it gui so I decided to hardcore an ip in Nexenta because every time i reboot the server it pulls a new ip from dhcp. I bing around and found some instructions. I edit "vim /etc/nwam/llp" then set "xnf0 static xx.xxx.xxx.xxx from dhcp" if i restart services "svcadm restart svc:/network/physical:nwam" i get the hard coded ip but if i reboot i get a dhcp ip.

To fix the static changing on reboot, on the openindiana irc board someone suggested:

svcadm disable network/physical:nwam
svcadm enable network/physical:default
ipadm create-addr -T static -a local=xx.xxx.xxx.xxx xnf0/v4

The last command failed because ipadm is not available on Nexenta Core. Now i have no IP when i boot.

Can anyone suggest a way for me to get myself out of this hole that i've dug?

-Flash

use ipconfig instead of ipadm like (nonpersistant, works only until boot)
current settings: ifconfig -a

new setting like:)
ifconfig hme0 netmask 255.255.255.0 broadcast + up

set ip then via napp-it menu system network to have persistant settings

or
reboot to last system snap

or
disble network:physical, enable nwam
 
(...)One vdev will give you the same IOPS as one disk. Say you have 20 disks in raidz3 - then you have 20 disks acting as a single disk - which means very bad IOPS.

So I admit, I was a bit skeptical on this and researched it a bit, and brutalizer is pretty much correct. I did come across an interesting link with some more specifics about IOPS and different kinds of vdev's in different situations:
http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance
 
One vdev will give you the same IOPS as one disk. Say you have 20 disks in raidz3 - then you have 20 disks acting as a single disk - which means very bad IOPS.

Sorry, that was a typo. The 20+2 was meant to be 10+2 (same as the original vdev). I was just curious if there was any unusual penalty from mixing vdev arrangements (other than the usual performance/bottleneck issues).
 
Never use a vdev larger than, say ~12ish disks. You will get extremely bad IOPS performance. And when you resilver it will take days.

I am going to buy a norco 4224. I am going to use two 12 disks in a raidz3. Better would be to have three 8 disks in raidz2 - but I dont care too much about IOPS.

One vdev will give you the same IOPS as one disk. Say you have 20 disks in raidz3 - then you have 20 disks acting as a single disk - which means very bad IOPS.

saying generically 'vdev is same iops as one disk' is not completely correct in the case where you have mirrored disks in the vdev - reads will be for the most part N times as fast (if an N-way mirror...)
 
Sorry, that was a typo. The 20+2 was meant to be 10+2 (same as the original vdev). I was just curious if there was any unusual penalty from mixing vdev arrangements (other than the usual performance/bottleneck issues).

I would not use such a large vdev on a database, VM or high load server but the
backup system of my main filer is build from one 14 disk raid-z3 + hotfix vdev.

Performance is good enough for backups and resilver time is ok with a raid-Z3.
And I get a 22 TB backup system build from 2 TB disks on a 16 bay case with one slot free for replacements.
If i would need a 24 slot backup system i would really think about a 22 disk raid-z3 + hotfix + one slot free config
(although i would prefer lesser 3 TB disks when buying a new one)

But with one disk less capacity i may use a 2 x 11 raid-z2 pool with nearly the same reliabilty and better performance.
Its mainly a question of what you need in first place. For my backup system with lots of snaps i decided for capacity.

To avoid problems i have a second tiering backup system (also one large raid-Z3 vdev) on a second place


one addition about performance.
If you read a datastream from a pool build from raid-z and the data is spread over all disks with read in parallel
then the sequential read performance can be up to amount of disks x one disk performance.
But if you need to read another data block, then all disks have to position to this data block.
So IO per second is the same like one disk
 
Last edited:
use ipconfig instead of ipadm like (nonpersistant, works only until boot)
current settings: ifconfig -a

new setting like:)
ifconfig hme0 netmask 255.255.255.0 broadcast + up

set ip then via napp-it menu system network to have persistant settings

or
reboot to last system snap

or
disble network:physical, enable nwam

Thanks, i re-enabled nwam and was able to reconnect the gui.
I found the area in the gui where i'm able to change to statoc ip but even here it reverts to dhcp on reboot. I found other posts in this thread that show how to do it manually and it appears that the gui is setting the right info in the expected areas.

svcs -a | grep network/physical
disabled 11:31:12 svc:/network/physical:nwam
online 11:31:14 svc:/network/physical:default

cat /etc/hostname.eth0 shows
xx.xxx.xxx.xxx netmask 255.255.255.0 ether *macaddr* mtu 1500 broadcast + up

NexentaCore 3.0 (Hardy 8.04/b134+) just refuses to hold the static ip across boots.
 
Back
Top