large storage with software RAID

stenrulz

n00b
Joined
Aug 20, 2012
Messages
33
Hello,

I am currently looking at a few options for the software RAID.

- Nexenta *not an option due to limits on community version
- Napp-It (Open Indiana) *not an option due to the older ZFS
- Napp-It (OmniOS)
- NAS4Free
- FreeNAS
- Unraid *not an option due to no striping

The main functions used will be iSCSI and SMB 3.0 but if the system does not support SMB 3.0 I can just run a windows 2012 virtual machine. I would most likely be looking at similar redundancy as RAID 6.

Preferably i would like quick rebuild times like Windows 2012 R2 Storage Spaces. I would of looked at Windows Storage Spaces minus the really proof performance for RAID5. As well would like to be able to replace drives and expand the storage. i.e. if there was 4 drives I could add another 4 to the same RAID array. Or replace small hard drives that are part of the array with larger ones such as 2TB to 4TB, etc.
http://technet.microsoft.com/en-us/library/dn387076.aspx

What are you recommendations and why?

Thanks in advance.
 
Hello,

I am currently looking at a few options for the software RAID.

- Nexenta *not an option due to limits on community version
- Napp-It (Open Indiana) *not an option due to the older ZFS
- Napp-It (OmniOS)
- NAS4Free
- FreeNAS
- Unraid *not an option due to no striping

The main functions used will be iSCSI and SMB 3.0 but if the system does not support SMB 3.0 I can just run a windows 2012 virtual machine. I would most likely be looking at similar redundancy as RAID 6.

Preferably i would like quick rebuild times like Windows 2012 R2 Storage Spaces. I would of looked at Windows Storage Spaces minus the really proof performance for RAID5. As well would like to be able to replace drives and expand the storage. i.e. if there was 4 drives I could add another 4 to the same RAID array. Or replace small hard drives that are part of the array with larger ones such as 2TB to 4TB, etc.
http://technet.microsoft.com/en-us/library/dn387076.aspx

What are you recommendations and why?

Thanks in advance.


This is mainly a question of BSD vs Solarish systems.
Compare first SAMBA vs Solaris CIFS Server

- Solaris CIFS server is very easy to handle with SMB1 and is integrated in ZFS,
is very fast and supports Windows AD, Windows SIDs and Windows ACL (unlike SAMBA)

- SAMBA 4 (BSD and Solaris) is on the way to support SMB2/3


iSCSI:
This seems a weak point in BSD setups.
No comparable features and performance like Comstar on Solaris & Co

On Solarish systems, I would prefer either Solaris 11 (stable, nonfree) or OmniOS
(stable, free but with payed support option). Solaris 11 is currently buggy with iSCSI.
There seems a fix available but only with an Oracle support contract.
 
Why not consider ZFS on Linux or old school reliable MDADM raid with the SCST iSCSI target? Alternatively, you could use KVM and map a LUN directly into a 2012 VM.
 
Why would you want to combine iSCSI and ZFS?

iSCSI is about raw device sharing.
 
Why would you want to combine iSCSI and ZFS?

iSCSI is about raw device sharing.

Why not? You still get all the checksuming, caching, and RAIDZ advantages with ZFS LUNs that you do with ZFS files. Compression and dedupe work just as well. So do snapshots.
 
Why not? You still get all the checksuming, caching, and RAIDZ advantages with ZFS LUNs that you do with ZFS files. Compression and dedupe work just as well. So do snapshots.

Snapshotting a raw volume below a live filesystem will always leave you with an inconsistent view from the client, just like Linux' insane volume manager snapshots would. For this to work right you would have to umount or downgrade to readonly on the client side during the time you take the snapshot which of course everybody ignores.

I am normally a FreeBSD person but for the specific use of iSCSI volumes I would use Linux md (this assumes I don't want to deal with the snapshotting below a filesystem issue and do other forums of rollback).

I assume the major use of iSCSI here is booting windows installs?
 
Snapshotting a raw volume below a live filesystem will always leave you with an inconsistent view from the client, just like Linux' insane volume manager snapshots would. For this to work right you would have to umount or downgrade to readonly on the client side during the time you take the snapshot which of course everybody ignores.

I am normally a FreeBSD person but for the specific use of iSCSI volumes I would use Linux md (this assumes I don't want to deal with the snapshotting below a filesystem issue and do other forums of rollback).

I assume the major use of iSCSI here is booting windows installs?

The snapshot would give a crash consistent view of things, as if the power cord were yanked from the box with the volume mounted. The desirability of such a thing depends on the running apps.

Why would iSCSI be limited to Windows boot? It could be used for storing VMs for vSphere or Hyper-V or for bulk storage. I've used ZFS for all, albeit with SCST's FC target and it has worked well.
 
The snapshot would give a crash consistent view of things, as if the power cord were yanked from the box with the volume mounted. The desirability of such a thing depends on the running apps.

Why would iSCSI be limited to Windows boot? It could be used for storing VMs for vSphere or Hyper-V or for bulk storage. I've used ZFS for all, albeit with SCST's FC target and it has worked well.

I was just asking. I much prefer to have much diskless stuff (virtual or not) with a network filesystem view than sharing raw devices, because that way I can mess with a diskless client by changing things on the server (even when the client isn't up). The major thing that can't be done that way is windows boot volumes, and then there's various PXE brokenness that spews into other OSes but still only concerns boot volumes.
 
....As well would like to be able to replace drives and expand the storage. i.e. if there was 4 drives I could add another 4 to the same RAID array. Or replace small hard drives that are part of the array with larger ones such as 2TB to 4TB, etc.

Here's my experience between hardware and ZFS.

In regards to array expansion.

With Hardware you can expand the array, but you retain the original RAID style.
For example if you have 3x4TB RAID5, you can add more 4TB drives, but it will stay one single RAID5 array and you will only have 1 parity drive. Some <expensive> controllers may get around this, but I don't know.

With ZFS, the "pool" is made of any number of arrays. You can not alter the arrays, but you can add to the pool. So if you start a pool 3x4TB in RAID5, you cannot add anymore drives to that RAID5, but you can add 2x4TB in RAID1 or 3x4TB in RAID5 and the extra space will be added to the pool over all. You can also use array of different disk sizes. You can add 2x1TB if you want. ZFS will create a virtual JBOD out of all the arrays attached to the pool. You can also have multiple pools as well.

In regards to replacing drives.

Both Hardware and ZFS will allow you to replace one drive at a time with drives of bigger sizes. Once the last drive is replaced the array will become the larger size.

Other thing to consider

ZFS is a universal file system, so any OS that can run ZFS should be able to recognize a ZFS pool. Which means if you ever replace hardware or change operating systems your arrays should transfer no problem. Where as hardware RAID is specific to the vendor and possibly to the specific hardware. Which can really limit your options should you expand or something happens to your hardware.

My opinion

If your goal is to have 1 computer and it runs Windows, then hardware RAID is the way to go. If you can move your server to Unix, Linux, or OSX then ZFS is the best choice. If you are ok with having 2 computers, one that is a pure file system server that then talks directly to a Windows server, the ZFS is still a good idea.
 
Last edited:
ZFS is a universal file system, so any OS that can run ZFS should be able to recognize a ZFS pool. Which means if you ever replace hardware or change operating systems your arrays should transfer no problem. Where as hardware RAID is specific to the vendor and possibly to the specific hardware. Which can really limit your options should you expand or something happens to your hardware.

Hmmm...

I cannot even read my own ZFS filesystems when I boot the machine into a FreeBSD that's a bit older than what was running there before.

Have you actually seen a FreeBSD ZFS (any age) been accessed by -say- Linux?
 
_Gea, I am thinking of running VMware ESXi and pass thru the disk/PCI slots to the virtual machines for storage. Use NFS or iSCSI for some other VMs, the main issue would be how to get a SMB 1.0 to SMB 3.0 gateway. Was looking at freeNAS using iSCSI and ZFS but the system would involve creating a very large virtual disk for iSCSI emulation to work. As I would most likely be looking at 40TB+.

Why Flexraid isnt option?
Currently looking into FlexRAID but got a few questions for someone about it.
1) If a drive fails how long does it take to rebuild compared to hardware raid? As one of my dislikes with the hardware raid it would take a very long time to rebuild.
2) What is the process to upgrade a disk say from 2TB to 4TB?
3) How does it support different size disks?
4) Has anyone tested flexraid with a large amount of data?
5) Could I shrink and expand the array? For example if currently 40% was used can I shrink the array to be that 40% only? Remove the now unused drives install larger ones and expand it?
Why not consider ZFS on Linux or old school reliable MDADM raid with the SCST iSCSI target? Alternatively, you could use KVM and map a LUN directly into a 2012 VM.

ZFS on linux is very unstable at the current point in time. Also, please read my note above about using VMware.

Why would you want to combine iSCSI and ZFS?

iSCSI is about raw device sharing.

Basically I am thinking of running VMware ESXI and pass-through all hard drives. Running Vdisk for most of the VMs is not an issue via NFS or iSCSI but the windows storage server (SMB 3.0) how do something similar to pass-through?

Snapshotting a raw volume below a live filesystem will always leave you with an inconsistent view from the client, just like Linux' insane volume manager snapshots would. For this to work right you would have to umount or downgrade to readonly on the client side during the time you take the snapshot which of course everybody ignores.

I am normally a FreeBSD person but for the specific use of iSCSI volumes I would use Linux md (this assumes I don't want to deal with the snapshotting below a filesystem issue and do other forums of rollback).

I assume the major use of iSCSI here is booting windows installs?

VMware ESXi running a few VMs and also a VM that is doing the SMB 3.0 with large data.

I was just asking. I much prefer to have much diskless stuff (virtual or not) with a network filesystem view than sharing raw devices, because that way I can mess with a diskless client by changing things on the server (even when the client isn't up). The major thing that can't be done that way is windows boot volumes, and then there's various PXE brokenness that spews into other OSes but still only concerns boot volumes.
The question is with NFS how do you pass-through into a window VM for SMB 3.0 support.

Here's my experience between hardware and ZFS.

In regards to array expansion.

With Hardware you can expand the array, but you retain the original RAID style.
For example if you have 3x4TB RAID5, you can add more 4TB drives, but it will stay one single RAID5 array and you will only have 1 parity drive. Some <expensive> controllers may get around this, but I don't know.

With ZFS, the "pool" is made of any number of arrays. You can not alter the arrays, but you can add to the pool. So if you start a pool 3x4TB in RAID5, you cannot add anymore drives to that RAID5, but you can add 2x4TB in RAID1 or 3x4TB in RAID5 and the extra space will be added to the pool over all. You can also use array of different disk sizes. You can add 2x1TB if you want. ZFS will create a virtual JBOD out of all the arrays attached to the pool. You can also have multiple pools as well.

In regards to replacing drives.

Both Hardware and ZFS will allow you to replace one drive at a time with drives of bigger sizes. Once the last drive is replaced the array will become the larger size.

Other thing to consider

ZFS is a universal file system, so any OS that can run ZFS should be able to recognize a ZFS pool. Which means if you ever replace hardware or change operating systems your arrays should transfer no problem. Where as hardware RAID is specific to the vendor and possibly to the specific hardware. Which can really limit your options should you expand or something happens to your hardware.

My opinion

If your goal is to have 1 computer and it runs Windows, then hardware RAID is the way to go. If you can move your server to Unix, Linux, or OSX then ZFS is the best choice. If you are ok with having 2 computers, one that is a pure file system server that then talks directly to a Windows server, the ZFS is still a good idea.

I do have hardware RAID but I have a few issues with it including that the RAID cannot expand across multiple raid cards. Slow rebuilding times, very slow upgrading process i.e take one drive out install new one wait for rebuild, and repeat for all the others. Slowish performance due to cheap raid cards, etc.
 
Currently looking into FlexRAID but got a few questions for someone about it.
1) If a drive fails how long does it take to rebuild compared to hardware raid? As one of my dislikes with the hardware raid it would take a very long time to rebuild.
2) What is the process to upgrade a disk say from 2TB to 4TB?
3) How does it support different size disks?
4) Has anyone tested flexraid with a large amount of data?
5) Could I shrink and expand the array? For example if currently 40% was used can I shrink the array to be that 40% only? Remove the now unused drives install larger ones and expand it?

1) I bet same amount of time as any other software raid if u compare it to hardware raid..
2) take 2 tibs out and put 4 tibs back? if u have enough space in ur raid, i think u can delete hdd off from system and it spreads those files around other hdds..
3) it supports all kind of disk sized.. not a picky soft..
4) Yes i know one over 50 tibs sizes and not that many problems or non problems..
5) Yes i think so :)
6) it dosnt support striping.. atleast not now..
 
Lot of large flexraid array on avs forums, I think there is a 100TB build on this forum by someone who used flexraid. Lot of people using it is for media and a big advantage is you can have all the drives spin down except for the drive you're reading off.
 
_Gea, I am thinking of running VMware ESXi and pass thru the disk/PCI slots to the virtual machines for storage. Use NFS or iSCSI for some other VMs, the main issue would be how to get a SMB 1.0 to SMB 3.0 gateway. Was looking at freeNAS using iSCSI and ZFS but the system would involve creating a very large virtual disk for iSCSI emulation to work. As I would most likely be looking at 40TB+.

If you really need SMB3 features, you may wait until Samba4 orSolaris CIFS supports this one day in a stable release or you must use Windows 2012 for sharing.

With Windows 2012, you can use either ReFS (a slow and not feature complete ZFS clone effort) or you can use iSCSI to hold data on ZFS. On OmniOS you can use the very fast Comstar framework for iSCSI where you can either create iSCSI LUNs as files or as ZFS volumes (block devices).

Size is not a problem either. Volumebased LUNs are mostly a little faster. From Windows view, they act like a huge local disk. Only caveat with iSCSI compared to SMB over ZFS.

You can snapshot your whole iSCSI disk but you cannot access single files on this snapshot. You must mount a clone to access single files from this snapshot. You can use Windows shadow-copies instead on your iSCSI LUN but ZFS snapshots are more powerfull.

You must weight W2012 SMB3 features against simplicity and trouble freeness of a Solaris CIFS fileserver
 
I use FlexRAID and definitely would atleast encourage you to try the demo. I have a 10TB(10TB reported by windows) array consisting of (4) 3TB segates an (1) 2TB WD drive.


2) What is the process to upgrade a disk say from 2TB to 4TB?

This is easy. Plug in 4TB drive. Stop storage pool (this will make drives no longer appear as 1 pooled drive and you will have access to individual lettered drives. Copy data from 2 TB to 4TB drive. remove the old drive and make a new storage pool with new drive. It may have to recreate parity , but on my system w/ 6/10TB used this takes about 5 hours
3) How does it support different size disks?

B/C there is no stripping. files are evenly spread across drives. So flexraid will give no speed boost over 1 drive. This is the down side when compared to raid 5 or 6. But for media and big data storage spped isn't usually an issue. I get between 70-120MB/s read/writes

5) Could I shrink and expand the array? For example if currently 40% was used can I shrink the array to be that 40% only? Remove the now unused drives install larger ones and expand it?

this is an interesting question , but I have no idea the best way to do it, although I can think of some. The FlexRAID forums are pretty good if you ask this over there some one will know.
 
Greetings

Hmmm...

I cannot even read my own ZFS filesystems when I boot the machine into a FreeBSD that's a bit older than what was running there before.

Have you actually seen a FreeBSD ZFS (any age) been accessed by -say- Linux?

This is not a problem with ZFS if you had created the pool with a lower version number from the start. I am using Solaris 11.1 and the ZFS version is 34 and when I create the pool I can set it to any level from 1 to 34, obviously you wouldn't start at too low a level as you would miss out on major functionality e.g. triple parity raid introduced in version 17, another one is version 19? where log device removal (e.g. failure of the device) just loses you the data in the ZIL, all prior versions would completely trash your entire pool. At a minimum I could use anything from version 20 onwards.

If for example your older FreeBSD used say version 23 and your newer FreeBSD used version 26. I could create my Solaris 11.1 pool at version 22 and both of your BSD's could access it, if I created it at version 25 then only your later FreeBSD could do so.

You can also use the upgrade command at any time, again if I upgraded my Solaris 11.1 version 22 pool with no arguments it would upgrade to the latest version 34, alternatively I could upgrade to any version between 22 and 34 I desire like say version 27, I can do subsequent upgrades again to another higher number (28+) and so on.

You only have to remember the cardinal rule in that whatever system you have it can only access any ZFS pool up to and including that version the OS supports (but not higher). For maximum portablility version 28 is the most recommended. When setting the pool version (1-34) don't forget to also set the matching filesystem version (1-6), see here and here for more info, migrating down is not possible so create the pool from the very start with the right version and you will save yourself a lot of grief.

Open ZFS introduces a lot of new non-standard features and to prevent older ZFS versions from accidentally opening these pools sometime in the future they set their version to a very large number like 1000 so there is no way standard ZFS systems will ever open them accidentally.

Now that you know what you have to do you can avoid this problem ever occuring again.

Cheers
 
Greetings



This is not a problem with ZFS if you had created the pool with a lower version number from the start. I am using Solaris 11.1 and the ZFS version is 34 and when I create the pool I can set it to any level from 1 to 34, obviously you wouldn't start at too low a level as you would miss out on major functionality e.g. triple parity raid introduced in version 17, another one is version 19? where log device removal (e.g. failure of the device) just loses you the data in the ZIL, all prior versions would completely trash your entire pool. At a minimum I could use anything from version 20 onwards.

If for example your older FreeBSD used say version 23 and your newer FreeBSD used version 26. I could create my Solaris 11.1 pool at version 22 and both of your BSD's could access it, if I created it at version 25 then only your later FreeBSD could do so.

You can also use the upgrade command at any time, again if I upgraded my Solaris 11.1 version 22 pool with no arguments it would upgrade to the latest version 34, alternatively I could upgrade to any version between 22 and 34 I desire like say version 27, I can do subsequent upgrades again to another higher number (28+) and so on.

You only have to remember the cardinal rule in that whatever system you have it can only access any ZFS pool up to and including that version the OS supports (but not higher). For maximum portablility version 28 is the most recommended. When setting the pool version (1-34) don't forget to also set the matching filesystem version (1-6), see here and here for more info, migrating down is not possible so create the pool from the very start with the right version and you will save yourself a lot of grief.

Open ZFS introduces a lot of new non-standard features and to prevent older ZFS versions from accidentally opening these pools sometime in the future they set their version to a very large number like 1000 so there is no way standard ZFS systems will ever open them accidentally.

Now that you know what you have to do you can avoid this problem ever occuring again.

Cheers

Good to know.

Can you definitely confirm that, as long as I keep the version number low enough, I can use e.g. a FreeBSD created ZFS in Linux, read-write, snapshots and all?

ETA: for all Linux ZFSes :)
 
Why not? You still get all the checksuming, caching, and RAIDZ advantages with ZFS LUNs that you do with ZFS files. Compression and dedupe work just as well. So do snapshots.

I tried iSCSI at first, but at least on FreeNAS I found performance degradations with iSCSI as compared to SAMBA and NFS.

After experimenting with a bunch of setups, NFS seems to be by far the most efficient protocol, though it lacks some of the security features we take for granted with SMB. if you can live without username and password authentication, instead doing security based on IP restrictions, NFS is by far the best option. Nothing gets higher speeds for me.
 
Zarathustra[H];1040353160 said:
I tried iSCSI at first, but at least on FreeNAS I found performance degradations with iSCSI as compared to SAMBA and NFS.

After experimenting with a bunch of setups, NFS seems to be by far the most efficient protocol, though it lacks some of the security features we take for granted with SMB. if you can live without username and password authentication, instead doing security based on IP restrictions, NFS is by far the best option. Nothing gets higher speeds for me.

At least on OmniOS/OI/Solaris, NFS is faster than SMB/CIFS which is faster than SMB/SAMBA and the fastest at all is iSCSI (via Comstar)
 
At least on OmniOS/OI/Solaris, NFS is faster than SMB/CIFS which is faster than SMB/SAMBA and the fastest at all is iSCSI (via Comstar)

Interesting.

I should probably qualify this with that the iSCSI vs. NFS test I did was over an internal virtual VMWare 10gig connection on my ESXi server (VMXnet3).

I'm trying to remember, but I think what happened was that at those high network speeds, iSCSI had a notable CPU load, wich started competing with my ZFS RAIDz2 array, and slowing it down.
 
Good to know.

Can you definitely confirm that, as long as I keep the version number low enough, I can use e.g. a FreeBSD created ZFS in Linux, read-write, snapshots and all?

ETA: for all Linux ZFSes :)

I have changed some systems from freebsd to linux and had no problems using the zfs pools.

Actually the system I'm working on right now used to be a freebsd install.

storage/share 35T 30T 4.7T 87% /storage/share
storage2 15T 128K 15T 1% /storage2

No problems here, but obviously YMMV.
 
Zarathustra[H];1040355533 said:
Interesting.

I should probably qualify this with that the iSCSI vs. NFS test I did was over an internal virtual VMWare 10gig connection on my ESXi server (VMXnet3).

I'm trying to remember, but I think what happened was that at those high network speeds, iSCSI had a notable CPU load, wich started competing with my ZFS RAIDz2 array, and slowing it down.

Benchmarking to compare a raw device sharing service like iSCSI and a networked filesystem is very challenging. The former has full access to the local filesystem buffer cache, the latter has a complicated system going on that is highly nondeterministic depending on what the server and other clients do.

I suspect that you high CPU load was the result of extensive amount of transactions that didn't actually hit the wire.
 
Nice. Which flavor of Linux ZFS are you using there, what kernel etc?

I tried to standardize on Debian. That was actually one of the main reasons why I switched away from the various openindiana/freebsd/omnios install I had, I wanted to have a more homogenous line of servers and all the non-linux options ran into various hw incompatibilities on several of the servers. I ended up having to use ubuntu server on one of the machines because of some issues with the debian installer and that particular machine, but overall I spend less time fixing problems not relating to zfs partly because the solutions can be applied to all the servers.
 
Thanks for all your input. I think I will use freenas 9.1 under VMware with NFS and the older version of SMB which is supported on freenas. Unless anyone think some other OS will be better? Any features above ZFS 28 worth while?

Also, Native ZFS on Linux would like to use but a bit still new for me.
 
Is the rebuild time roughly the same between flex raid, unraid, zfs and hardware raid?

Also, for the people who have large storage 30TB. What is your setup and how do you deal with rebuild times?
 
Last edited:
Also, for the people who have large storage 30TB. What is your setup and how do you deal with rebuild times?

I have a variety of hardware, mostly motherboards/CPUs that have been retired from my main pc coupled with intel 10 GbE NICs and LSI/HPT/... controllers in Norco cases.

Regarding rebuild times, I wait some hours, even a day or two :D

I have backups so I don't sit there biting my fingernails while this is going on.
 
Thanks for all your input. I think I will use freenas 9.1 under VMware with NFS and the older version of SMB which is supported on freenas. Unless anyone think some other OS will be better? Any features above ZFS 28 worth while?

Also, Native ZFS on Linux would like to use but a bit still new for me.

This is what I do.

Just keep in mind that the folks over at the FreeNAS forums are rather opposed to running it under VMWare, and will give you a lecture about it every time you ask a question. Most of their reasons are bogus, or based on ignorance.
 
Good to know.

Can you definitely confirm that, as long as I keep the version number low enough, I can use e.g. a FreeBSD created ZFS in Linux, read-write, snapshots and all?

ETA: for all Linux ZFSes :)

Yes. The only gotchya you might run into is actually around the disk partition layout. EFI, GPT, etc, and some of the older versions of Solaris-based distros, which couldn't understand the disks if they came from a FreeBSD world. That and the version of the pool are the only known gotchya's I'm aware of for migrating between OS.

If you throw version 28 on a pool, at present, it should be importable on any of the latest versions (as of 1/19/2013) of:

NexentaStor 3.1.x
Any illumos-based distro (NexentaStor 4.x when released, OmniOS, OpenIndiana, SmartOS, etc)
FreeBSD 9+
ZFS On Linux
Development version of MacZFS (ZFS-OSX)*

* Mac OSX support is back in active development (references below), but isn't yet 'stable'. It is being ported from the current ZFS On Linux code, thus should have support for at least v28 pools.

As these operating systems and their various ZFS implementations move to fully support and base themselves off the Open-ZFS project (open-zfs.org) which to my knowledge they /ALL/ presently plan to do, you'll find you won't have to worry about sticking to a less-than-newest "version" (which is itself a bit of a misnomer, due to the move away from pool versions to feature flags and versions), just so long as you keep up to date on whatever 2 systems you're moving the pool between.
 
At least on OmniOS/OI/Solaris, NFS is faster than SMB/CIFS which is faster than SMB/SAMBA and the fastest at all is iSCSI (via Comstar)

Common misconception. On OmniOS/OI/Solaris/etc, iSCSI via COMSTAR is not faster. It is "faster, due to risky behavior".

The most common benchmark - take a new box and run an iSCSI zvol out and an NFS share out and test [insert benchmark here] against it will routinely favor iSCSI. This is not because iSCSI was faster, it is because it was completely ignoring ZIL mechanics and making no use of the log, because the default mode for COMSTAR is writeback cache enabled, and the default mode for nearly every iSCSI client is not to send sync writes (and in fact, some hypervisor layers may by default /eat/ a sync request before passing it back down to the storage). NFS, on the other hand, is on most clients defaulting to sync on the mount option. NFS wouldn't lose your data out of the box in a power loss event -- COMSTAR most assuredly will.

After disabling write cache on the LU, many use-cases will find iSCSI and NFS performance in parallel or with NFS taking a slight to major lead. Depends. This gets into the world of default block sizes and what kind of workload you have, and starts to encompass a lot more pieces of ZFS (ARC & L2ARC efficiency, compression efficiency, long-term fragmentation issues, the list goes on and on).

However, unless you're 100% certain you should be using a small block size, especially in large (many TB) deployment, I'd recommend you stick to NFS and don't muck with its defaults, because in general, ZFS has dozens of areas where small block sizes harm your performance and efficiency, immediately and over time, as compared to larger block sizes.

Use a file-level protocol like NFS unless you absolutely can't, and be prepared and understand the inefficiencies and risks of a block-level protocol on ZFS before using it. That should be your rule of thumb.

(edit: In fact, at work, we not only steer customers very strongly towards all-file [NFS] deployments, we've found all-zvol deployments to generally carry a much higher support burden. That's vendor speak for more problems - which means we don't like it from a cost perspective, but which also means YOU shouldn't like it because it means we've identified it as something that's going to cause you more heartache and late nights at the office and tears and blood)

(edit 2: I feel I need to add that this doesn't mean COMSTAR is just flat out shit. Or that there aren't some workloads that pretty much demand it. You can make it work. You can make it work fine. But there's more to it, it's less forgiving, it's more prone to issue, and it is not the "easy button" that NFS on zfs filesystems are)
 
Last edited:
Common misconception. On OmniOS/OI/Solaris/etc, iSCSI via COMSTAR is not faster. It is "faster, due to risky behavior".

The most common benchmark - take a new box and run an iSCSI zvol out and an NFS share out and test [insert benchmark here] against it will routinely favor iSCSI. This is not because iSCSI was faster, it is because it was completely ignoring ZIL mechanics and making no use of the log, because the default mode for COMSTAR is writeback cache enabled, and the default mode for nearly every iSCSI client is not to send sync writes (and in fact, some hypervisor layers may by default /eat/ a sync request before passing it back down to the storage). NFS, on the other hand, is on most clients defaulting to sync on the mount option. NFS wouldn't lose your data out of the box in a power loss event -- COMSTAR most assuredly will.

After disabling write cache on the LU, many use-cases will find iSCSI and NFS performance in parallel or with NFS taking a slight to major lead. Depends. This gets into the world of default block sizes and what kind of workload you have, and starts to encompass a lot more pieces of ZFS (ARC & L2ARC efficiency, compression efficiency, long-term fragmentation issues, the list goes on and on).

However, unless you're 100% certain you should be using a small block size, especially in large (many TB) deployment, I'd recommend you stick to NFS and don't muck with its defaults, because in general, ZFS has dozens of areas where small block sizes harm your performance and efficiency, immediately and over time, as compared to larger block sizes.

Use a file-level protocol like NFS unless you absolutely can't, and be prepared and understand the inefficiencies and risks of a block-level protocol on ZFS before using it. That should be your rule of thumb.

I agree with you if you are using ESXi where I also prefer NFS.
In Mac and Windows environments you often have no choice, either because of performance needs (like too slow SMB) or because you need a native filesystem that is only possible over iSCSI.

Especially for SMB (sync disabled) iSCSI (writeback enabled), is much faster on Windows and extremely faster on Macs. For use cases like video editing I would prefer large blocksizes as well.
 
I agree with you if you are using ESXi where I also prefer NFS.
In Mac and Windows environments you often have no choice, either because of performance needs (like too slow SMB) or because you need a native filesystem that is only possible over iSCSI.

Especially for SMB (sync disabled) iSCSI (writeback enabled), is much faster on Windows and extremely faster on Macs. For use cases like video editing I would prefer large blocksizes as well.

The problem with suggesting iSCSI is 'faster' is it is waxing over the myriad of current deficiencies in COMSTAR, and the myriad of current deficiencies in zvols, that just aren't present when you opt for filesystems and SMB/NFS. I guess what I'm trying to say is IF you're going that route, /be aware/ of what you're giving up, and what additional pain points you're adding. If you're aware of them, and still choose to do it, more power to you.

I've found a significant minority of users who seemed stuck on a decision to go with iSCSI due to the perceived or actual performance benefits, and who seemed to imply that performance was their most important benchmark, either went iSCSI and later regretted it and even went through the sometimes significantly costly process of getting OFF of it, or who when all the gotchya's were explained mulled it over and surprised me by agreeing to go file-level instead, sometimes in direct opposition to their prior comments that 'performance was king'. For SMB versus iSCSI, yeah, iSCSI is going to win a shootout - age, version support on SMB, and performance all play a part in that, for now (but watch for early to mid next year, and I'll say no more). But the question is, does it win the shootout by enough to override the other problems using zvols entails. There's just a ton of semantics available to the ZFS side of the house when you use a file-level protocol that just aren't there with a zvol and block-level client. Something as simple as having a real inkling of drive space usage is the sort of thing new-comers to ZFS don't realize they're giving up by going iSCSI. Or being able to list files on the ZFS side at all. Both of those seem obvious to us, but you'd be surprised how many users are already picking iSCSI over a file-level protocol so early on in the process and so new to storage in general that they don't even realize they just lost both of those. Not to mention the inefficiencies in the aforementioned areas of ZFS, mostly stemming from the block size differences pretty inherent in zvols versus filesystems (I rarely if ever see 64-128K zvols being used).

If per-client performance is truly king amongst your requirements, iSCSI may be your choice. I just wish when making that decision people were familiar with all they're giving up, and all the inefficiencies they're incurring, and that their aggregate performance out of that ZFS box will be /significantly lower/, possibly multiple times lower, if they go all-iSCSI at 4-8K average block size as opposed to going all-NFS/CIFS at 32-128K.

Also, while I've not yet had time to seriously play with it, I've heard from a number of corners that the NFS client and functionality on Windows 2012 isn't half bad, and is supposedly stable. So perhaps being stuck on SMB is no longer true, for Windows clients. I need to set up a lab and really hammer that, to prove the assertions others have made to me about it. :)
 
.. For SMB versus iSCSI, yeah, iSCSI is going to win a shootout - age, version support on SMB, and performance all play a part in that, for now (but watch for early to mid next year, and I'll say no more)...

Also, while I've not yet had time to seriously play with it, I've heard from a number of corners that the NFS client and functionality on Windows 2012 isn't half bad, and is supposedly stable. So perhaps being stuck on SMB is no longer true, for Windows clients. I need to set up a lab and really hammer that, to prove the assertions others have made to me about it. :)

As Oracle is out of the game, i truly hope that Nexenta is the one that can keep the Solaris CIFS server ahead or at least in par (as it is now) to SAMBA.
Speed improvements means SMB > 1 and I expect Nexenta to be the first there (and hope for backports to Illumos ).

Regarding NFS and other Protocoles like AFP or SAMBA SMB:
You always have the problem (with Windows users) that SolarisCIFS is the only one that offers 100% compatibility with Windows ACL and Windows SID. I do not expect Solaris NFS to compete in this aspect.
 
This always means the native port these days, right?

How did they solve the licensing problem?

ZFS on linux is not part of the linux kernel. You have to add the module code separately (or the distro does)
 
Also, any recommendations for freenas and a 16-24 port SAS HBA card? If i am going to use ZFS RAIDZ then i think i should not be using hardware as they are not passing all into to the software layer such as IDs, SMART info, etc.
 
Back
Top