Couple "noob" questions on ZFS.

SirMaster

2[H]4U
Joined
Nov 8, 2010
Messages
2,168
I'm about to create a ZFS server and apparently in this day and age with multi-TB disks you need to use ashift=12 to properly support modern 4K drives.

I have been reading that some people have problems with wasted space with ashift=12.

First off, I will always follow the power of 2 rule.
I will be doing 6x2TB RAID2Z vdev and a 6x3TB RAID2Z vdev in my zpool.

But since my server is a media server and practically ALL my files are larger than 1MB each, will I have much wasted space due to ashift=12? My knowledge tells me that I won't and it only really affects thousands of small files that are under 4K each, but I wanted to hear from you guys that understand this better.

I also only have about 100,000 files on my 12TB of data (which I believe is pretty low for that much data) so I would believe metadata will be small as well (400MB if each file has a 4kb block worth of metadata).


My second question is about compression. Does this help offset some of the loss due to ashift=12?

From what I understand, it is a good idea to use in most cases.

I really only care about getting as close as I can to saturating my gigabit NIC for my ZFS performance, I never need faster. But as long as I can get a solid ~75MB/s or so I would be happy.

My hardware is quite fast. A 4 core 8 thread 3.3GHz Xeon Haswell and 16GB RAM. I do not mind the increased CPU overhead of compression at all if it saves me space.

So do you think I should use compression given my situation?

If yes, which algorithm do I use?

It looks like LZJB is more widely supported, but I see ZoL also supports LZ4 which is newer and much better. Are there other options beyond this?

Thanks
 
Last edited:
1. If you follow the power of two rule, you will not have wasted space due to padding. If you have a lot of small files (smaller than the stripe size for RAIDZ/2/3) you will waste space, but that is true for most filesystems.

2. If you store mainly media files, you will not benefit much from compression. I see 1.00-1.02% compressratios on my media dataset. I would always use LZ4, as it has almost no performance impact on decompression.
 
I also do not plan on using ZVOLs or anything so I don't need to worry about what you guys were discusing before with block sizes and wasted space.

I do have a few more things than media. I guess I would say its 75% uncompressible media. So I think LZ4 should be fine to use in general. I heard that it's smart enough to forgo compression on files or blocks that it can't compress by at least 12% or something? Maybe that was a bogus claim.

Thanks again! I might have more questions in the future. I know where to come to ask haha.
 
You have to consider that ZFS compresses on block level. So a 128 KiB block (largest possible) will only be stored as a compressed block if it fits into a smaller sized block. Blocks have a 2^n granularity. I'm not sure at the moment whether ZFS will allocate more than one block to store a compressed block (e.g. a 128 KiB into a 64 KiB and a 32 KiB block), but I guess it does not. So a 128 KiB block has to fit at least into a 64 KiB block after compression to gain anything at all from it.

Please note that I made some guesses here, I did not have the time to dig deeper in the compression scheme yet. What I wrote here is what I derived from what I read. But it is certain that you will not get the same compression ratio as if you would directly compress the whole file with the same algorithm.
 
Back
Top