Couple "noob" questions on ZFS.

Discussion in 'SSDs & Data Storage' started by SirMaster, Oct 9, 2013.

  1. SirMaster

    SirMaster 2[H]4U

    Messages:
    2,113
    Joined:
    Nov 8, 2010
    I'm about to create a ZFS server and apparently in this day and age with multi-TB disks you need to use ashift=12 to properly support modern 4K drives.

    I have been reading that some people have problems with wasted space with ashift=12.

    First off, I will always follow the power of 2 rule.
    I will be doing 6x2TB RAID2Z vdev and a 6x3TB RAID2Z vdev in my zpool.

    But since my server is a media server and practically ALL my files are larger than 1MB each, will I have much wasted space due to ashift=12? My knowledge tells me that I won't and it only really affects thousands of small files that are under 4K each, but I wanted to hear from you guys that understand this better.

    I also only have about 100,000 files on my 12TB of data (which I believe is pretty low for that much data) so I would believe metadata will be small as well (400MB if each file has a 4kb block worth of metadata).


    My second question is about compression. Does this help offset some of the loss due to ashift=12?

    From what I understand, it is a good idea to use in most cases.

    I really only care about getting as close as I can to saturating my gigabit NIC for my ZFS performance, I never need faster. But as long as I can get a solid ~75MB/s or so I would be happy.

    My hardware is quite fast. A 4 core 8 thread 3.3GHz Xeon Haswell and 16GB RAM. I do not mind the increased CPU overhead of compression at all if it saves me space.

    So do you think I should use compression given my situation?

    If yes, which algorithm do I use?

    It looks like LZJB is more widely supported, but I see ZoL also supports LZ4 which is newer and much better. Are there other options beyond this?

    Thanks
     
    Last edited: Oct 9, 2013
  2. omniscence

    omniscence [H]ard|Gawd

    Messages:
    1,311
    Joined:
    Jun 27, 2010
    1. If you follow the power of two rule, you will not have wasted space due to padding. If you have a lot of small files (smaller than the stripe size for RAIDZ/2/3) you will waste space, but that is true for most filesystems.

    2. If you store mainly media files, you will not benefit much from compression. I see 1.00-1.02% compressratios on my media dataset. I would always use LZ4, as it has almost no performance impact on decompression.
     
  3. SirMaster

    SirMaster 2[H]4U

    Messages:
    2,113
    Joined:
    Nov 8, 2010
    I also do not plan on using ZVOLs or anything so I don't need to worry about what you guys were discusing before with block sizes and wasted space.

    I do have a few more things than media. I guess I would say its 75% uncompressible media. So I think LZ4 should be fine to use in general. I heard that it's smart enough to forgo compression on files or blocks that it can't compress by at least 12% or something? Maybe that was a bogus claim.

    Thanks again! I might have more questions in the future. I know where to come to ask haha.
     
  4. omniscence

    omniscence [H]ard|Gawd

    Messages:
    1,311
    Joined:
    Jun 27, 2010
    You have to consider that ZFS compresses on block level. So a 128 KiB block (largest possible) will only be stored as a compressed block if it fits into a smaller sized block. Blocks have a 2^n granularity. I'm not sure at the moment whether ZFS will allocate more than one block to store a compressed block (e.g. a 128 KiB into a 64 KiB and a 32 KiB block), but I guess it does not. So a 128 KiB block has to fit at least into a 64 KiB block after compression to gain anything at all from it.

    Please note that I made some guesses here, I did not have the time to dig deeper in the compression scheme yet. What I wrote here is what I derived from what I read. But it is certain that you will not get the same compression ratio as if you would directly compress the whole file with the same algorithm.