OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

balance101 · Jul 23, 2012

Is dual core more than enough for just nappit? Say for 24 drive? Or quad core?

Thinking about using my xeon 1260l + 1200kp for nappit only for the time being and when i upgrade, get dual core and new motherboard for the nappit box and use 1260l and s1200kp for esxi alone

Why Solaris 11 over OL

Devroush · Jul 24, 2012

I'm hoping to get some more information here. I'm trying to migrate from NexentaCore to illumos but I haven't been able to import my zfs pool. Both run as a vm on a vmware esxi server. The pool consists of 3 3TB disks, which are raw mapped in esxi.

Original host (NexentaCore) reports this:
pool: data
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0

I then export the pool via the napp-it webinterface, which succeeds. I poweroff NexentaCore, add the raw disk mappings to the illumos vm and boot up illumos.

When I try to import the pool, I get this:
no pools available to import
pool: data
id: 10943202326073525848
state: UNAVAIL
action: The pool cannot be imported due to damaged devices or data.
config:

data UNAVAIL insufficient replicas
raidz1-0 UNAVAIL corrupted data
c2t0d0 ONLINE
c2t1d0 ONLINE
c2t2d0 ONLINE

I haven't tried importing it using -f flag as I feel that is a last-ditch effort. However, importing the same pool again in NexentaCore still works perfectly. Any ideas what could be going on here? Thanks!

UberMac · Jul 24, 2012

First of all my accolades to _Gea. Napp-It is a great product.

I am trying to get some input on my performance and what I can do to tune it.

First some background, I am not a newbie to ZFS but by no means a expert. I have used it on Solaris, Nexenta, OSX but have not had the time to sit down and really tune it until now.

I have a CORAID SRX chassis (It is a SuperMicro 36 Drive Chassis) providing 20 3TB 7.2K drives/4 OCZ Deneva2 200GB SSDs/ and 1 ZeusRAM 8GB I just got in as JBOD drives over 10Gb AOE to a HP DL380 12Core/2.8Ghz/192Gb RAM running ESXI 5 with up to date patches.On tis box I am running OI 151a5/Napp-IT 0.8h. I have 64GB given to the VM with all 5 3TB Mirrored vDevs the ZuesRAM as a SLOG and 2 of the Denevas as L2Arc via RDM.
I have 8 cores to the VM.It provides a NFS store to ESXi.

My issue is:

I seem to be suffering the ESXi NFS issue and am not sure really how much to expect. Here is what I have tested sofar:

1. Running DD in Napp-IT I get 1600MB Write/2178MB Read which is outstanding.
2. Bonnie++ in Napp-IT gives 677MB Write/1266 Read still great.
3. In a 2008R2 VM CrystalDiskMark gives
Seq 785/60MB
512K 576/56MB
4K 30/2.6
4K QD32 224/5.3
4. If I disable sync then CrystalMark is more like
Seq 785/166MB
512K 576/295MB
4K 25/5
4K QD32 209/5

When I ran this test on my second Pool which is identical other than using the 2 Deneva's partitioned at 30GB as a mirrored Slog, and a Crucial M4 256GB as L2Arc I got almost identical results.

So my question is: Is this approx what I can expect and why am I not getting a performance boost from the ZuesRAM. I would love to dig into this but please understand I am familiar with Solaris but not well versed. Include commands for any tests.

When Sync is set to disabled, is the SLOG being used and ZFS is just lying to ESXi or is it being bypassed? From what I can see it appears to be being bypassed.

_Gea · Jul 24, 2012

Devroush said:
I'm hoping to get some more information here. I'm trying to migrate from NexentaCore to illumos but I haven't been able to import my zfs pool. Both run as a vm on a vmware esxi server. The pool consists of 3 3TB disks, which are raw mapped in esxi.

Moving pools from NCP to Illumian/OI should not be a problem with real disks.
I suppose its a RDM problem (this is not a recommended setting)

If possible, remove your RDM setup, use pass-through of real hardware to Illumian or OI

rewtraw · Jul 24, 2012

How much of a performance hit is there if you add a new vdev to a pool after the current vdev's are full?

Basically, is having data distributed unevenly across vdevs a huge deal for a home NAS? Is it worth destroying/rebuilding for?

Reference:
http://icesquare.com/wordpress/how-to-improve-zfs-performance/#section10

For example:

Code:

                                                  capacity
pool                                           alloc   free 
---------------------------------------------  -----  -----  -----  -----  -----  -----
goliath                                        20T   10T 
  raidz1                                       10T   0G
    sda 
    sdb  
    sdc   
 
NEW VDEV:
 raidz1                                       10T   10T 
    sde       
    sdf   
    sdg       
---------------------------------------------  -----  -----  -----  -----  -----  -----

Devroush · Jul 24, 2012

_Gea said:
Moving pools from NCP to Illumian/OI should not be a problem with real disks.
I suppose its a RDM problem (this is not a recommended setting)

If possible, remove your RDM setup, use pass-through of real hardware to Illumian or OI

Thanks for the comment. I know the RDM is kinda sketchy, unfortunately my motherboard doesn't support pass-through and I'm not going to buy a new one anytime soon

. I guess I'll stick with NexentaCore for now.

ST3F · Jul 24, 2012

Here is some ZFS Benchmark !

Xeon x3450 4c/8t 2,66 / 3,1
Intel S3420GPV
8 GB RAM : 4x 2 Go ECC HP 1066
Operating System on SSD OCZ Vertex Series Plus 60 Go (last JMicron Firmware) partitioned @ 30 Go (.... for 30 Go provisionning ^^)
IBM M1015 flashed in IT 9211-8i
5x 2 To WD Green WD20EARS 5400trs (Widdle3 /s300) in Icy Dock MB455SPF-B 5 in 3
2x SSD STEC Mach 16 50 Go(ZIL write cache mirrored)
1x SSD OCZ Vertex 3 60 Go (LOG) / last firmware
Nic : Dual port 10 GbE SFP+ SR
Sharkoon Rebel 12
Seasonic M2II 650w

Solaris 11, Napp-it 0.8, version Pool : 28, RaidZ, ashift=9 // no SSD included

Code:

write 10.24 GB via dd, please wait...
time dd if=/dev/zero of=/stock/dd.tst bs=1024000 count=10000

10000+0 records in
10000+0 records out

real       20.5
user        0.0
sys         4.7

10.24 GB in 20.5s = 499.51 MB/s Write

read 10.24 GB via dd, please wait...
time dd if=/stock/dd.tst of=/dev/null bs=1024000

10000+0 records in
10000+0 records out

real       57.4
user        0.0
sys         4.2

10.24 GB in 57.4s = 178.40 MB/s Read

... Write @ 499.51 MB/s
... Read @ 178.4 MB/s

Then, I use ZFSGuru to prepare the RaidZ in pool 28 + aligned 4k

OpenIndiana 151a5, Napp-it 0.8, version Pool : 28, RaidZ, ashift=12 // no SSD included

Code:

write 10.24 GB via dd, please wait...
time dd if=/dev/zero of=/stock/dd.tst bs=1024000 count=10000

10000+0 records in
10000+0 records out

real       24.3
user        0.0
sys         5.0

10.24 GB in 24.3s = 421.40 MB/s Write

read 10.24 GB via dd, please wait...
time dd if=/stock/dd.tst of=/dev/null bs=1024000

10000+0 records in
10000+0 records out

real       37.4
user        0.0
sys         3.3

10.24 GB in 37.4s = 273.80 MB/s Read

... Write @ 421.40 Mo/s
... Read @ 273.80 Mo/s

With SSD
=> ZIL : 2x STEC STEC MACH16 M 50 Go mirror
=> L2ARC : OCZ-VERTEX3 60 Go
... Write : bs=1024000 count=10000 : 409 MB/s
... Read : bs=1024000 count=10000 : 176.25 MB/s

/!\ On SSD, ashift=9 // on RaidZ ashift = 12 ...important ?

++

ST3F · Jul 25, 2012

OpenIndiana 151a5, Napp-it 0.8, version Pool : 28, RaidZ, ashift=12 // SSD inclu

Shared network folder mounted on Windows 7 Pro x64, with Samba / CIFS via 10 GbE

<-- on the left, first copy of a video file from a SAS 15k 600 Go Hitachi to ZFS (133 MB/s)

--> on the right, copy of this same video file from a SAS 15k 600 Go Hitachi to ZFS... seems the video file has been cached ? (337 MB/s)

SAS_vers_ZFS_RaidZ_5xWD20ERAS_Premiere_Copie__Puis__Deuxieme_copie.JPG

The tes wirh the software we use to test arrays performances for video editing :

BlackMagic_DiskSpeed_Test_ZFS_Samba_10GbE_RaidZ_5xWD_WD20EARS.png

++

nostradamus99 · Jul 25, 2012

@Gea ..

I've tried your suggestion to go the CIFS route but I'm not sure if I should be happy or cry

http://hardforum.com/showpost.php?p=1038939095&postcount=3534

First a small account on what have I been up in the last 1.5 weeks:

I decided to test the ISCSI option first to see how it did.

This is where I ran into my first problem:
After copying about 2 GB from my physical nas to my virtual nas through ISCSI my whole storage crashed. (had to reboot the nexenta VM)

Checking the nexenta forums I read something about disabling VAAI on esxi (which I did). http://kb.vmware.com/kb/1033665

I was reluctant to try again but I did and now everything was fine..speed was great with about 85-95 MB/s when copying 850 GB from my old to new nas without problems!!

The next day I wanted to copy some data to my new iscsi nas vm but again my storage almost crashed (I hit the cancel button as soon I as knew something went wrong)
Nexenta stayed up this time but ESXi was giving warnings like this:

2012-07-24T17:32:21.538Z cpu1:2049)WARNING: ScsiDeviceIO: 1218: Device naa.600144f0c52383000000500928190001 performance has deteriorated. I/O latency increased from average value of 177139 microseconds to 3637169 microseconds.
2012-07-24T17:32:23.136Z cpu1:121479)ScsiDeviceIO: 1198: Device naa.600144f0c52383000000500928190001 performance has improved. I/O latency reduced from 3637169 microseconds to 722304 microseconds.
2012-07-24T17:32:26.562Z cpu0:2052)WARNING: ScsiDeviceIO: 1218: Device naa.600144f0c52383000000500928190001 performance has deteriorated. I/O latency increased from average value of 177178 microseconds to 3881406 microseconds.
2012-07-24T17:32:43.797Z cpu2:121481)WARNING: ScsiDeviceIO: 1218: Device naa.600144f0c52383000000500928190001 performance has deteriorated. I/O latency increased from average value of 177271 microseconds to 8444857 microseconds.
2012-07-24T17:32:44.050Z cpu1:120958)ScsiDeviceIO: 1198: Device naa.600144f0c52383000000500928190001 performance has improved. I/O latency reduced from 8444857 microseconds to 1679669 microseconds.
2012-07-24T17:33:44.438Z cpu1:119821)WARNING: NMP: nmpDeviceTaskMgmt:2210:Attempt to issue lun reset on device naa.600144f0c52383000000500928190001. This will clear any SCSI-2 reservations on the device.
2012-07-24T17:33:46.930Z cpu3:2126)VSCSI: 2763: Retry 0 on handle 8209 still in progress after 2 seconds
Complete Gore here:
http://pastebin.com/GRy9vA4n

I then tried the setting for SATA disks( Syszfsvdevmaxpending = 1 default=10), my next copy did not make the storage crash but ESXi was still giving warnings and for me that was the final drop:
I'm done with ISCSI.

One thing I was also noticing on ESXi was that the Nexenta E1000 nic is dropping packets
%DRPRX:

I'm unsure if switching to vmxnet3 is going to fix things because I've read multiple people changing it but fail to see and big speed advantages.

So today I played with the CIFS server of Nexenta.

I enabled it in workgroup mode, deleted the zvol of my 2 disk mirror (2X2TB Seagate ST2000DM001),created a folder and started copying stuff to it.
From my physical NAS I'm getting these numbers: (HP ML110 G5 with OS installed on SSD and a local Seagate 1TB drive)

So not even close to the 85-95 MB/s I had with iscsi.

My virtual nas (which is on the 3 way mirror served by nexenta through NFS) is giving these results:

I created the volume with ashift=12 with the help of ZFSGuru (not able to do this with Nexenta) . I also did a test to benchmark (as far as this is representative)

I think I'm limited to these options:
- Somehow Get nexenta CIFS to give better results
- Hold out and wait for v4 of Nexenta (if it will ever be released)
- Switch to something else with a decent CIFS server with GUI

These questions come to mind:
- Is there any way I can increase the speed on CIFS..? (or it this is..?)
- What I like about nexenta is that it is an appliance, what other options can I use with better chances of succes ?
-------- Is OI/Napp-IT the same way in the sense that you don't have to upgrade packages every week to keep things going? (so far seems the best option thanks to Gea's excellent work)
-------- NAS4FREE (based on BSD)
-------- FREENAS (based on BSD)

EDIT:
I forgot to mention what I configured for the network and the CIFS server:
My server had a free nic port so I configured a new vswitch and connected the free NIC to this vswitch.
I then changed the ISCSI ip configuration (VMNIC3), I connected VMNIC3 to this new vswitch and gave it a regular ip address 192.168.10.x (in DNS I created a A host record for 192.168.10.x and pointed it to nexentanas instead of the original hostname which is still nexenta) I think this way I can connect to the CIFS server through a separate NIC. (When I copy stuff to the CIFS server the data travels over that vswitch so CIFS traffic has a dedicated NIC this way as far as I can see)

Stanza33 · Jul 25, 2012

@ ST3F

Can I ask you to do the following

When you are trying some dd benchmarks...

Use a share name not just the pool name.
Eg don't use /tank use say your cifs share upon /tank to store the dd output.

Next

Once you have done the write test..... Wait at least 40 seconds

Then do your read test

I bet your read speeds will increase.

.

ST3F · Jul 25, 2012

Stanza33 said:
@ ST3F

Can I ask you to do the following

When you are trying some dd benchmarks...

Use a share name not just the pool name.
Eg don't use /tank use say your cigs share upon /tank to store the dd output..

The pool name is "Stock".
The tree is like this :

Code:

ZFServer-Stock
          |--DATA
               |--hd

( \\ZFServer\Stock\DATA\HD )

I share / mount the HD folder on H letter disk in Windows 7
Then I copy a 7 GB .mkv file
After I try with the Blackmagic Test Speed benchmark.

Stanza33 said:
Once you have done the write test..... Wait at least 40 seconds

Then do your read test

I bet your read speeds will increase.

.

Ok, I will try

Chers.

St3F

impmonkey · Jul 26, 2012

Just got the first of two Openindiana + Nappit servers built. This first one is straight OpenIndiana the next will be an esxi all in one.
Build List: X2
Norco 4220
Intel Xeon E5-2603
SUPERMICRO X9SRI-F
Kingston KVR16R11D4K4/32I 32gb Kit x2
2 M1015
2 Intel 320 40 gb mirrored Rpool Boot Drives
2 Intel 320 40gb Mirrored Zil Drives
1 Intel 320 40gb l2arc Drive
10 Hitachi 7200 2tb Drives
3 Icydocks for ssds

Config:
pool: ZFS1
state: ONLINE
scan: none requested
config:
1 Pool w/ 4 vdevs built from mirrors.
2 Spares for said mirrors

pool: ZFS1
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
ZFS1 ONLINE
mirror-0 ONLINE
c8t5000CCA36AC8ACFFd0 ONLINE 0 0 0
c8t5000CCA36ACA96F3d0 ONLINE 0 0 0
mirror-2 ONLINE
c8t5000CCA36AC7A49Ad0 ONLINE 0 0 0
c8t5000CCA36ACA93E5d0 ONLINE 0 0 0
mirror-3 ONLINE
c8t5000CCA36ACAE07Fd0 ONLINE 0 0 0
c8t5000CCA36ACB968Cd0 ONLINE 0 0 0
mirror-4 ONLINE
c8t5000CCA36ACAD0B6d0 ONLINE 0 0 0
c8t5000CCA36ACAD7E2d0 ONLINE 0 0 0
logs
mirror-1 ONLINE
c8t5001517972E6A38Ad0 ONLINE 0 0 0
c8t5001517972E6A418d0 ONLINE 0 0 0
cache
c4t1d0 ONLINE 0 0 0

* have not added spares yet
Sorry for the Dirty Paste

DD Benchmarks
Without L2ARC:

With L2ARC

I will update this post once I get the ESXi All-in-one built and run some VM Benchmarks.

Stanza33 · Jul 26, 2012

impmonkey said:
DD Benchmarks
Without L2ARC:

With L2ARC

<snip>

**Cough**

Bullshit figures

**Cough**

@ _Gea

Not that I am a fan of dd benchmarking and ZFS... but

When you get some time can you look into maybe updating the Defaults of the DD Bench in Napp-it

Change the default from say

Code:

[B]Write Test[/B]
time /usr/gnu/bin/dd if=/dev/zero of=/"name of zpool"/ddtest bs=1024000 count=10000

[B]Read Test[/B]
time /usr/gnu/bin/dd if=/"name of zpool"/ddtest of=/dev/null bs=1024000

to be something like

Code:

[B]Write Test[/B]
time /usr/gnu/bin/dd if=/dev/zero of=/"name of zpool"/ddtest bs=1024000 count="2x Ram Size"

Wait 40seconds

[B]Read Test[/B]
time /usr/gnu/bin/dd if=/"name of zpool"/ddtest of=/dev/null bs=1024000

as lots of people have more than 10gig ram in their ZFS boxen

And the benchmark results are either unrealistic, or can cause concern when the results seem skewed too low....

Especially the READ results....
As without the greater than 30sec delay (after the write test) before it runs.... the results can seem very low.... as the write test still has not finished.

And as such interferes with the read test...

.

_Gea · Jul 26, 2012

Stanza33 said:
<

@ _Gea

Not that I am a fan of dd benchmarking and ZFS... but

When you get some time can you look into maybe updating the Defaults of the DD Bench in Napp-it

.

on the todo list.

impmonkey · Jul 26, 2012

Stanza33 said:
<snip>

**Cough**

Bullshit figures

**Cough**

.

Agreed.

nostradamus99 · Jul 26, 2012

@Gea,
Referring to my post earlier I was pointed to the fact that maybe I have to adjust my transaction groups in Nexenta to be smaller as in 64MB (zfs_write_limit_override)

I've searched the 4 corners of the internet but haven't been able to find how to adjust this in Nexenta (Solaris).. any ideas..?

EDIT:
This is what I see in ESXi when copying from my Windows machine to the Nexenta CIFS server:

_Gea · Jul 26, 2012

nostradamus99 said:
@Gea,
Referring to my post earlier I was pointed to the fact that maybe I have to adjust my transaction groups in Nexenta to be smaller as in 64MB (zfs_write_limit_override)

]

Hello nostradamus99

could you please clarify your virtual NAS-config
Is this something like an All-in-One with a virtualized SAN
(like ESXi with a NexentaStor 3.1 Storage VM - disks connected via pass-through)
and your network config (like Intel nic with pass-through or virtual e1000g)

On the other hand, NexentaStor, based on OpenSolaris 134 is quite old and
not as fast as the newer Solaris 11 or OpenIndiana. (NexentaStor4 with the same kernel
as OpenIndiana is on the way. Its free base Illumian is already available for testings)

If NexentaStor 3 is too slow, i would compare against distributions with newer kernels.
Mostly you do not need to tweak anything with Solaris based servers.

nostradamus99 · Jul 26, 2012

AllinOne Config is as follows:

ESXi
Core i5 with 32 GB RAM
8GB Nexenta v 3.1.3 VM with 3 E1000 NICS (1 nic for VM Network, 1 internal NIC for NFS, 1 NIC for Nexenta CIFS.
M1015 in passtrough, 3 way mirror with NFS for VM's and another mirror of 2x2TB drives which was given to a Windows VM trough ISCSI but due to instability I'm now testing the CIFS server portion of Nexenta (with slow file transfer as result)
Dual Intel GB NIC(HP Branded) NOT in passthrough

So far I've found this:
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/27103

Nobody seems to be able to explain as to why CIFS performance is so slow. (as shown in graphic earlier)
I haven't pulled the trigger yet on moving on to something else, It's gonna take work to setup CIFS, users and shares. I'm thinking about moving to OI but then if Nexenta v4 comes out I'm not sure if I have to setup CIFS config/users/shares again

I know I shouldn't be needing to tweak this setup but maybe It's not "tuned" right for a home setup
For example the zfs_vdev_max_pending = 10 has to be adjusted to zfs_vdev_max_pending = 1 if you are using SATA disks..how many people know about that..?

btw the setting should be placed in the /etc/system file but I'm not sure in what format (bytes,hex etc..)

EDIT
If I was to go the All in One Napp-IT route what OS would you suggest (OS should have an option for a simple migration to Nexenta V4 IF it is ever released..)
I'm a relative "novice" with some experience with Debian. The virtual SAN should be just that a Virtual SAn/NFS/CIFS server with as low maintenance as possible:

Ilumian

or

Open Indiana

or

Solaris Express v11..

_Gea · Jul 26, 2012

nostradamus99 said:
So far I've found this:
http://comments.gmane.org/gmane.os.solaris.opensolaris.zfs/27103

too old to be relevant (Solaris 10)

If I was to go the All in One Napp-IT route what OS would you suggest (OS should have an option for a simple migration to Nexenta V4 IF it is ever released..)
I'm a relative "novice" with some experience with Debian. The virtual SAN should be just that a Virtual SAn/NFS/CIFS server with as low maintenance as possible:

Ilumian

or

Open Indiana

or

Solaris Express v11..

Most up to date is OpenIndiana 151a5
I would use the live version with GUI due to useability and timeslider
http://dlc.openindiana.org/isos/151a5/oi-dev-151a5-live-x86.iso

Solaris 11 Express is outdated.
Solaris 11 seems the fastest at the moment but its closed source, lacks
VMXNet3 drivers und you cannot move pools to OI/Illumian/Nexenta
(with Nexenta4 they will share the same pools with feature sets v.5000-
already available in OI 151a5 ) and the speed difference to OI is quite small

wkearney99 · Jul 26, 2012

Anyone done a benchmark comparing OI running on the metal vs running inside an ESXi VM in the same box? I'm debating whether to set the box up with OI and use KVM or ESXi. The VM use is fairly trivial in it's demands, so performance there isn't a big deal.

And would performance in OI be harmed much if it was booted from a USB stick, but had all served data (and VMs) on a 6 hard drive raidz2 pool?

There's no current budget for an SLC but I have a spare MLC SSD I might be able to throw into it. From all the message threads online it's hard to distill whether using an SSD for the L2ARC would be harmed if the same drive was also used to contain the OS files. Again, no separate ZIL is planned, but there's not an abundance of writing going on here, most use is for reads.

danswartz · Jul 26, 2012

Booting any solaris variant from USB is not recommended. Rather than share with an L2ARC SSD, can you throw it on a small, cheap HD? Or if you run OI as a VM that is not an issue.

nostradamus99 · Jul 27, 2012

_Gea said:
too old to be relevant (Solaris 10)

Most up to date is OpenIndiana 151a5
I would use the live version with GUI due to useability and timeslider
http://dlc.openindiana.org/isos/151a5/oi-dev-151a5-live-x86.iso

Solaris 11 Express is outdated.
Solaris 11 seems the fastest at the moment but its closed source, lacks
VMXNet3 drivers und you cannot move pools to OI/Illumian/Nexenta
(with Nexenta4 they will share the same pools with feature sets v.5000-
already available in OI 151a5 ) and the speed difference to OI is quite small

Ok. Then OI it will be

I'll install it in a VM with VMXNet adapters to see how this performs compared to Nexenta.

NOTE 1: does the zfs_vdev_max_pending = 10 -> 1 setting also applies to OI..? (Needs to be set in Nexenta if you use SATA disks..)

NOTE 2: Does OI VM need my passed-trough M1015 during installation or can I add it later..? (its kinda tied to Nexenta VM at the moment, which holds my firewall VM---->without firewall----> no nappit

)

wkearney99 · Jul 27, 2012

danswartz said:
Booting any solaris variant from USB is not recommended. Rather than share with an L2ARC SSD, can you throw it on a small, cheap HD? Or if you run OI as a VM that is not an issue.

Yes, I've been using an old HD to do just that during this testing phase. I've got ESXi booting from USB and wondered if it might be likewise possible to do that with OI. It'd be one less spindle. I'm just unaware of what sort of performance OI wants out of it's OS volume.

That and I generally prefer to keep the data on a server on separate spindles from the OS. This to allow connecting an array to a different setup should it be necessary.

What's the real downside of sharing the L2ARC? Yes, in a high-use environment I understand where contention for it could be an issue.

But this is not that environment (small home office and home media server). Here I just want 'fast as possible gigE client acccess for office/dev work and media streaming. There's no significant database activity involved. The box is on a UPS and uses ECC memory.

Sorry if I'm rehashing anything here, it's a bit of an adventure getting up to speed on all of this.

danswartz · Jul 27, 2012

It isn't so much contention as a hassle. You would have to manually partition&slice the drive, install OI to a separate drive, then add the SSD's slice as a mirror, resilver, install grub to the SSD, remove the HD from the rpool mirror, reboot to make sure it works, etc... It's a huge hassle for no real benefit...

wkearney99 · Jul 27, 2012

I already have OI running on a separate drive, in a partition. So I'd be half-way there. But re-installing from scratch onto the right sized partitions isn't that much of a hassle. The question isn't the setup hassle, it's the operational impact. Being able to properly control the drive for the L2ARC functions is something I've read about elsewhere, but then a lot of things have happened between the release of ZFS and the present. So it's a bit of a challenge isolating what can be done, what's been a past problem, and what's know to work today.

danswartz · Jul 27, 2012

OI does little enough root pool I/O that contention should not be significant. I just think you're solving a non-existent problem. Unless you flat out do not have enough sata ports or whatever. Your call of course...

ST3F · Jul 27, 2012

With

Xeon x3450 4c/8t 2,66 / 3,1
Intel S3420GPV
8 GB RAM : 4x 2 Go ECC HP 1066
Operating System on SSD OCZ Vertex Series Plus 60 Go (last JMicron Firmware) partitioned @ 30 Go (.... for 30 Go provisionning ^^)
IBM M1015 flashed in IT 9211-8i
5x 1 To WD Black WD1001FALS 7200 Trs in Icy Dock MB455SPF-B 5 in 3
2x SSD STEC Mach 16 50 Go(ZIL write cache mirrored)
1x SSD OCZ Vertex 3 60 Go (LOG) / last firmware
Nic : Dual port 10 GbE SFP+ SR
Sharkoon Rebel 12
Seasonic M2II 650w

Code:

  pool: black
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
	still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
	pool will no longer be accessible on older software versions.
  scan: none requested
config:

	NAME                       STATE     READ WRITE CKSUM     CAP            Product
	black                      ONLINE       0     0     0
	  raidz1-0                 ONLINE       0     0     0
	    c3t50014EE203B9A3E5d0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-0
	    c3t50014EE2AE6582FEd0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-0
	    c3t50014EE6000CBA31d0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-5
	    c3t50014EE655620F20d0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-5
	    c3t50014EE6AAAAB8DCd0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-7
	logs
	  mirror-1                 ONLINE       0     0     0
	    c3t5000A720300547DDd0  ONLINE       0     0     0     50.02 GB       STEC MACH16 M
	    c3t5000A720300547FFd0  ONLINE       0     0     0     50.02 GB       STEC MACH16 M
	cache
	  c3t5E83A97E6268ABEDd0    ONLINE       0     0     0     60.02 GB       OCZ-VERTEX3

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM     CAP            Product
	rpool       ONLINE       0     0     0
	  c6t0d0s0  ONLINE       0     0     0     120.03 GB      OCZ VERTEX-PLUS

errors: No known data errors

OpenIndiana 151a5, 8GB of RAM, Napp-it 0.8, version Pool : 28, RaidZ, ashift=9
I benchmark with 20.48 GB

Code:

write 20.48 GB via dd, please wait...
time dd if=/dev/zero of=/black/dd.tst bs=1024000 count=20000

20000+0 records in
20000+0 records out

real       57.1
user        0.0
sys         7.8

20.48 GB in 57.1s = 358.67 MB/s Write


read 20.48 GB via dd, please wait...
time dd if=/black/dd.tst of=/dev/null bs=1024000

20000+0 records in
20000+0 records out

real       54.2
user        0.0
sys         6.7

20.48 GB in 54.2s = 377.86 MB/s Read

Wait 40seconds

Read Test

Code:

time /usr/gnu/bin/dd if=/black/ddtest of=/dev/null bs=1024000

Code:

                              capacity     operations    bandwidth
pool                       alloc   free   read  write   read  write
-------------------------  -----  -----  -----  -----  -----  -----
black                       765G  3,78T  2,97K      0   380M      0
  raidz1                    765G  3,78T  2,97K      0   380M      0
    c3t50014EE203B9A3E5d0      -      -    679      0  76,2M      0
    c3t50014EE2AE6582FEd0      -      -    683      0  76,2M      0
    c3t50014EE6000CBA31d0      -      -    680      0  77,2M      0
    c3t50014EE655620F20d0      -      -    673      0  76,8M      0
    c3t50014EE6AAAAB8DCd0      -      -    676      0  76,7M      0
logs                           -      -      -      -      -      -
  mirror                       0  46,5G      0      0      0      0
    c3t5000A720300547DDd0      -      -      0      0      0      0
    c3t5000A720300547FFd0      -      -      0      0      0      0
cache                          -      -      -      -      -      -
  c3t5E83A97E6268ABEDd0    55,9G      0     22    317   355K  39,6M
-------------------------  -----  -----  -----  -----  -----  -----

... that's fine !

Still through Samba, folder "video" mounted on disk drive (Y:\)

ATTO_Disk_Benchmark_Test_ZFS_Samba_10GbE_RaidZ_5xWD_WD1001FALS.JPG

CrystalDiskMark_Test_AfterReboot_ZFS_Samba_10GbE_RaidZ_5xWD_WD1001FALS.JPG

There is a BIG clue !!!
... why these 2 benchs disagree the one each with the other ??
=> Write : ATTO = 354 MB/s | CrystalDiskMark = 233 MB/s
=> Read : ATTO = 290 MB/s | CrystalDiskMark = 169 MB/s
???????

Cheers.

St3F

ST3F · Jul 28, 2012

The last benchmark for me with this configuration :

Xeon x3450 4c/8t 2,66 / 3,1
Intel S3420GPV
8 GB RAM : 4x 2 Go ECC HP 1066
Operating System on SSD OCZ Vertex Series Plus 60 Go (last JMicron Firmware) partitioned @ 30 Go (.... for 30 Go provisionning ^^)
IBM M1015 flashed in IT 9211-8i
vDev #1 : 5x 1 TB WD Black WD1001FALS 7200 Trs in Icy Dock MB455SPF-B 5 in 3
vDev #2 : 5x 2 TB Hitachi HDS72302
2x SSD STEC Mach 16 50 Go(ZIL write cache mirrored)
1x SSD OCZ Vertex 3 60 Go (LOG) / last firmware
Nic : Dual port 10 GbE SFP+ SR
Sharkoon Rebel 12
Seasonic M2II 650w

Code:

   pool: black
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
    still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
    pool will no longer be accessible on older software versions.
  scan: scrub repaired 0 in 0h31m with 0 errors on Sat Jul 28 23:44:19 2012
config:

    NAME                       STATE     READ WRITE CKSUM     CAP            Product
    black                      ONLINE       0     0     0
      raidz1-0                 ONLINE       0     0     0
        c3t50014EE203B9A3E5d0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-0
        c3t50014EE2AE6582FEd0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-0
        c3t50014EE6000CBA31d0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-5
        c3t50014EE655620F20d0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-5
        c3t50014EE6AAAAB8DCd0  ONLINE       0     0     0     1000.20 GB     WDC WD1001FALS-7
      raidz1-2                 ONLINE       0     0     0
        c3t5000CCA369C9168Fd0  ONLINE       0     0     0     2.00 TB        Hitachi HDS72302
        c3t5000CCA369CACE70d0  ONLINE       0     0     0     2.00 TB        Hitachi HDS72302
        c3t5000CCA369CADD07d0  ONLINE       0     0     0     2.00 TB        Hitachi HDS72302
        c3t5000CCA369CAE3F5d0  ONLINE       0     0     0     2.00 TB        Hitachi HDS72302
        c3t5000CCA369CBD585d0  ONLINE       0     0     0     2.00 TB        Hitachi HDS72302
    logs
      mirror-1                 ONLINE       0     0     0
        c3t5000A720300547DDd0  ONLINE       0     0     0     50.02 GB       STEC MACH16 M
        c3t5000A720300547FFd0  ONLINE       0     0     0     50.02 GB       STEC MACH16 M
    cache
      c3t5E83A97E6268ABEDd0    ONLINE       0     0     0     60.02 GB       OCZ-VERTEX3

errors: No known data errors

  pool: rpool
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM     CAP            Product
    rpool       ONLINE       0     0     0
      c6t0d0s0  ONLINE       0     0     0     120.03 GB      OCZ VERTEX-PLUS

errors: No known data errors

OpenIndiana 151a5, 8GB of RAM, Napp-it 0.8, version Pool : 28, RaidZ, ashift=9
I benchmark with 10.24 GB

Code:

write 10.24 GB via dd, please wait...
time dd if=/dev/zero of=/black/dd.tst bs=1024000 count=10000

10000+0 records in
10000+0 records out

real       12.3
user        0.0
sys         5.0

10.24 GB in 12.3s = 832.52 MB/s Write

read 10.24 GB via dd, please wait...
time dd if=/black/dd.tst of=/dev/null bs=1024000

10000+0 records in
10000+0 records out

real       14.4
user        0.0
sys         3.5

10.24 GB in 14.4s = 711.11 MB/s Read

Still through Samba, folder "video" mounted on disk drive (Y:\)

ATTO_Benchmark_ZFS_Samba_10GbE_RaidZ_2_vDevs_%205xWD_WD1001FALS_5xHitachi_HDS72302.JPG

CrystalDiskMark_Test__ZFS_Samba_10GbE_RaidZ_2_vDevs_%205xWD_WD1001FALS_5xHitachi_HDS72302.JPG

There is a BIG clue !!!
... why these 2 benchs disagree the one each with the other ??
=> Write : ATTO = 153 MB/s | CrystalDiskMark = 360 MB/s
=> Read : ATTO = 450 MB/s | CrystalDiskMark = 165 MB/s
???????

Power use :
- 148 W on idle
- 216 W with dd bench

Cheers.

St3F

amps2volts · Jul 29, 2012

Ok I know this is way off topic here. I need little help and should have posted it in this thread I so pose. Is anybody else having major AFP share login issue with napp-it on OI 151a5? I found a few people in this thread that have a similar issue to what I'm having. Thought would be better to post this in main SSDs & Data Storage. OpenSolaris derived needs its own category in hard forum big time. Here is thread if anybody has some idea's on what to do. Thanks.

http://hardforum.com/showthread.php?t=1707996

_Gea · Jul 29, 2012

amps2volts said:
Ok I know this is way off topic here. I need little help and should have posted it in this thread I so pose. Is anybody else having major AFP share login issue with napp-it on OI 151a5? I found a few people in this thread that have a similar issue to what I'm having. Thought would be better to post this in main SSDs & Data Storage. OpenSolaris derived needs its own category in hard forum big time. Here is thread if anybody has some idea's on what to do. Thanks.

http://hardforum.com/showthread.php?t=1707996

new napp-it 0.8k

due to problems with netatalk2.2.3 and OI 151a5, i have
uploaded napp-it 0.8k + netatalk3 installer for OI 151a3

http://napp-it.org/downloads/changelog_en.html
older netatalk installers keep available

netatalk3 not fully tested and currently without Bonjour support
(connect via finder -> go to server -> afp://ip )

Any insights about Bonjour or PAM support for AD are welcome

ps
AFP is faster on Macs than SMB (due to a slow SMB implementation on Macs) and needed for TimeMachine.

Most annoying thing is, that every minor update or change from
OSX, netatalk, Oracle database, other third party tools or OpenIndiana
can break functionality. For this reason, i personally avoid netatalk whenever possible.

But to be honest. netatalk 3 is a huge step forward and allows mixed use of a folder with netatalk and SMB
because the AFP systemfiles are now not visible via SMB.

_Gea · Jul 29, 2012

ST3F said:
There is a BIG clue !!!
... why these 2 benchs disagree the one each with the other ??
=> Write : ATTO = 153 MB/s | CrystalDiskMark = 360 MB/s
=> Read : ATTO = 450 MB/s | CrystalDiskMark = 165 MB/s
???????

St3F

I was wondering mostly about the bad read values.
Can you redo a test without the cache drive?

ps
Its not sharing over SAMBA but with the kernel-based CIFS server

Child of Wonder · Jul 29, 2012

Has anyone ever tried installing this patch for Comstar to enable VAAI?

http://permalink.gmane.org/gmane.os.illumos.devel/7250

VAAI support made my try Nexenta CE once and I quickly regretted that decision and went back to OI + Napp-it. However, since my lab runs vSphere and a couple dozen VMs, VAAI support would be really, really nice.

Jim G · Jul 30, 2012

I tried to update to 0.8k from 0.8h today and it seems to have killed napp-it; browsing to ip:81 now just gives:

Software error:

Can't locate UUID/Tiny.pm in @INC (@INC contains: /var/web-gui/data/napp-it/CGI /usr/perl5/site_perl/5.10.0/i86pc-solaris-64int /usr/perl5/site_perl/5.10.0 /usr/perl5/vendor_perl/5.10.0/i86pc-solaris-64int /usr/perl5/vendor_perl/5.10.0 /usr/perl5/vendor_perl /usr/perl5/5.10.0/lib/i86pc-solaris-64int /usr/perl5/5.10.0/lib .) at admin.pl line 713.
BEGIN failed--compilation aborted at admin.pl line 713.

For help, please send mail to this site's webmaster, giving this error message and the time and date of the error.

I'll reboot later on today once the VMs have finished doing their jobs... anyone else had issues updating?

trentster · Jul 30, 2012

and the fun continues.....

This now on a clean (new hard drive) install of Openindiana 151a5

- install openindiana
- install napp-it
- passwd root (change root passwd for smb)
- reboot
- wget -O - www.napp-it.org/afp | perl

Code:

---------------------------------
option: delete old be netatatalk 3.0
Unable to destroy netatalk-3.0.
No such BE.



4.0 ready
 
 
######################################
 
     -thats it, AFP is installed, have fun
     -connect your browser to http://serverip:81
 
######################################

- reboot

Napp-it still shows shows " afp-server : netatalk not installed "

Looks like the install script for afp is broken even for brand new fresh installs.

Gea, please rescue us from this?

trentster · Jul 30, 2012

Gea, thanks a ton for your help. Its much appreciated.

I finally have gotten this right. For anyone else hitting this thread having issues, this is how I got it working.

1) Openindiana 151a5 clean install.
2) reboot
3) wget -O - www.napp-it.org/nappit | perl
4) reboot
5) upgrade nappit via web GUI from napp-it 0.8h to napp-it 0.8k
6) reboot
7) wget -O - www.napp-it.org/afp | perl
8) reboot

After this netatalk 3 is installed and working perfectly!

The speeds via AFP from the macs are now pretty phenomenal!

I am getting 100MB/sec sustained from a macmini (single hdd) to the napp-it server via afp

I have tested and confirmed this is working from the following OSX Versions

- Snow Leopard
- Lion
- Mountain Lion

Gea thanks & thanks again !

nostradamus99 · Jul 30, 2012

@Gea,

I did some more testing on my new rig:
Core i5
32GB RAM
Esxi v5 with latest patches

- Created a VM (8GB RAM) with OI Live and installed it to a 16GB VMDK on a Local SSD Drive (samsung 830)
- Installed VMWare Tools for VMXNet3 drivers
- Installed Napp-IT
- Gave the OI VM, 2 more HD's in 50GB VMDK format on the same SSD.
- Created a Pool in Mirror
- In this Pool, created a ZFS Folder and made it available through SMB
- Copied the same MKV from different machines but throughput is about the same at 50 MB/s
- Ran crystalmark.

(found this blog with some ZFS tips but unsure if they are of use with OI: http://icesquare.com/wordpress/how-to-improve-zfs-performance/_

Is it even possible to achieve higher speeds with OI/CIFS (when it consists of a 2 way mirror) when being written to from a Windows 2008 R2 machine?
Can't use my passedthrough disks yet since they are still tied to my nexenta vm.

_Gea · Jul 30, 2012

hello nostradamus99

your blog tweaking info is about FreeBSD
There you need the tweaking but not on Solaris based systems.
They are optimized for ZFS out of the box.

But I suppose, you will never get virtualized NAS/SAN performance near to real-hardware
NAS/SAN performance unless you can give OpenIndiana real hardware access to SAS controllers
and disks via pass-through. Virtualized Storage (Storage provided by ESXi as VMDK)
is always slower.

nostradamus99 · Jul 30, 2012

I agree but these numbers on a SSD, I just can't believe it that VMFS would create such a overhead. What I've learned about VMFS is that it only has a 2-3% performance penalty..
Maybe I just want to much from my SATA disks but I've seen the 85-95 MB/s on ISCSI so I know it's possible..
Still no-one with a simple 2-way mirror with OI who can provide some information on how much speed you are getting while copying large files to the SMB/CIFS Share of OI..?

ST3F · Jul 30, 2012

_Gea said:
I was wondering mostly about the bad read values.
Can you redo a test without the cache drive?

ps
Its not sharing over SAMBA but with the kernel-based CIFS server

So, after these 2 benchmarks with 10 GbE over Samba / CIFS
=> http://hardforum.com/showpost.php?p=1038981342&postcount=3628
=> http://hardforum.com/showpost.php?p=1038983445&postcount=3629
... you ask me, to redo the test without read cache (L2ARC), still through Samba with 10 GbE nic

OpenIndiana 151a5, 8GB of RAM, Napp-it 0.8h nightly May. 03. 2012, version Pool : 28, RaidZ, ashift=9

Code:

write 10.24 GB via dd, please wait...
time dd if=/dev/zero of=/black/dd.tst bs=1024000 count=10000

10000+0 records in
10000+0 records out

real       12.3
user        0.0
sys         5.0

10.24 GB in 12.3s = 832.52 MB/s Write

read 10.24 GB via dd, please wait...
time dd if=/black/dd.tst of=/dev/null bs=1024000

10000+0 records in
10000+0 records out

real       14.6
user        0.0
sys         3.6


10.24 GB in 14.6s = 701.37 MB/s Read

. . . . . . . . . ATTO Disk Benchmark, total length : 256 MB . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . ATTO Disk Benchmark, total lentgh : 2 GB

ATTO_Benchmark__256MB_ZFS_Samba_10GbE_RaidZ_2_vDevs_ 5xWD_WD1001FALS_5xHitachi_HDS72302_Witjout_Cache.JPG

ATTO_Benchmark__2GB_ZFS_Samba_10GbE_RaidZ_2_vDevs_ 5xWD_WD1001FALS_5xHitachi_HDS72302_Witjout_Cache.JPG

CrystalDiskMark_Test__ZFS_Samba_10GbE_RaidZ_2_vDevs_ 5xWD_WD1001FALS_5xHitachi_HDS72302_Without_Cache.JPG

_._._._

BlackMagic_Disk_Speed_Test__ZFS_Samba_10GbE_RaidZ_2_vDevs_ 5xWD_WD1001FALS_5xHitachi_HDS72302_Without_Cache.JPG

... CrystalDiskMark & ATTO Disk Benchmark still disagree between them ...

Still without cache, Bonnie :
bonnie++ -u root -d /dev/zvol/rdsk/black/video -s 16384M -m ZFServer

Code:

Version 1.03c       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ZFServer        16G 73193  99 655848  69 332747  51 58376  99 860896  47  2321   7
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 31544  99 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
ZFServer,16G,73193,99,655848,69,332747,51,58376,99,860896,47,2321.4,7,16,31544,99,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

Cheers.

St3F

unhappy_mage · Jul 30, 2012

ST3F said:
... CrystalDiskMark & ATTO Disk Benchmark still disagree between them ...

I see from earlier that you have 8GB of memory. I recommend running tests with no less than 2x physical ram, to enforce that at least half of the bytes will actually get all the way to disk. In addition, if you see a difference between a test with 2x ram and one with 3x ram, then you probably need to go very large (30x ram, say) to benchmark the actual disk system.

That said, you need to benchmark what you care about. CDM and ATTO are not the real applications you want to run; what do you want to do with this system? You need to profile your real use. If you watch movies from the array, it's a very different access pattern than compiling code for 50 developers on it (shudder) or running a database on it. Benchmarks only show array performance on the thing they do, not what you want to do with it.

OpenSolaris derived ZFS NAS/ SAN (OmniOS, OpenIndiana, Solaris and napp-it)

Limp Gawd

n00b

n00b

Supreme [H]ardness

n00b

n00b

Limp Gawd

Limp Gawd

Weaksauce

Gawd

Limp Gawd

n00b

Gawd

Supreme [H]ardness

n00b

Weaksauce

Supreme [H]ardness

Weaksauce

Supreme [H]ardness

Limp Gawd

2[H]4U

Weaksauce

Limp Gawd

2[H]4U

Limp Gawd

2[H]4U

Limp Gawd

Limp Gawd

n00b

Supreme [H]ardness

Supreme [H]ardness

2[H]4U

Limp Gawd

n00b

n00b

Weaksauce

Supreme [H]ardness

Weaksauce

Limp Gawd

[H]ard|DCer of the Month - October 2005