Couple "noob" questions on ZFS.

SirMaster · Oct 9, 2013

I'm about to create a ZFS server and apparently in this day and age with multi-TB disks you need to use ashift=12 to properly support modern 4K drives.

I have been reading that some people have problems with wasted space with ashift=12.

First off, I will always follow the power of 2 rule.
I will be doing 6x2TB RAID2Z vdev and a 6x3TB RAID2Z vdev in my zpool.

But since my server is a media server and practically ALL my files are larger than 1MB each, will I have much wasted space due to ashift=12? My knowledge tells me that I won't and it only really affects thousands of small files that are under 4K each, but I wanted to hear from you guys that understand this better.

I also only have about 100,000 files on my 12TB of data (which I believe is pretty low for that much data) so I would believe metadata will be small as well (400MB if each file has a 4kb block worth of metadata).

My second question is about compression. Does this help offset some of the loss due to ashift=12?

From what I understand, it is a good idea to use in most cases.

I really only care about getting as close as I can to saturating my gigabit NIC for my ZFS performance, I never need faster. But as long as I can get a solid ~75MB/s or so I would be happy.

My hardware is quite fast. A 4 core 8 thread 3.3GHz Xeon Haswell and 16GB RAM. I do not mind the increased CPU overhead of compression at all if it saves me space.

So do you think I should use compression given my situation?

If yes, which algorithm do I use?

It looks like LZJB is more widely supported, but I see ZoL also supports LZ4 which is newer and much better. Are there other options beyond this?

Thanks

omniscence · Oct 9, 2013

1. If you follow the power of two rule, you will not have wasted space due to padding. If you have a lot of small files (smaller than the stripe size for RAIDZ/2/3) you will waste space, but that is true for most filesystems.

2. If you store mainly media files, you will not benefit much from compression. I see 1.00-1.02% compressratios on my media dataset. I would always use LZ4, as it has almost no performance impact on decompression.

SirMaster · Oct 9, 2013

I also do not plan on using ZVOLs or anything so I don't need to worry about what you guys were discusing before with block sizes and wasted space.

I do have a few more things than media. I guess I would say its 75% uncompressible media. So I think LZ4 should be fine to use in general. I heard that it's smart enough to forgo compression on files or blocks that it can't compress by at least 12% or something? Maybe that was a bogus claim.

Thanks again! I might have more questions in the future. I know where to come to ask haha.

omniscence · Oct 10, 2013

You have to consider that ZFS compresses on block level. So a 128 KiB block (largest possible) will only be stored as a compressed block if it fits into a smaller sized block. Blocks have a 2^n granularity. I'm not sure at the moment whether ZFS will allocate more than one block to store a compressed block (e.g. a 128 KiB into a 64 KiB and a 32 KiB block), but I guess it does not. So a 128 KiB block has to fit at least into a 64 KiB block after compression to gain anything at all from it.

Please note that I made some guesses here, I did not have the time to dig deeper in the compression scheme yet. What I wrote here is what I derived from what I read. But it is certain that you will not get the same compression ratio as if you would directly compress the whole file with the same algorithm.

Couple "noob" questions on ZFS.

SirMaster

2[H]4U

omniscence

[H]ard|Gawd

SirMaster

2[H]4U

omniscence

[H]ard|Gawd