When will ZFS use RAM ?

Discussion in 'SSDs & Data Storage' started by tobiasl, Feb 18, 2011.

  1. tobiasl

    tobiasl [H]Lite

    Messages:
    100
    Joined:
    Nov 19, 2007
    When I was building my ZFS server I read zfs nede a lot of ram but my zfs server max use 600mb I have 12GB on it and space for upgrade to 24GB, are it only if you use deup or will it ben use when I get more raidz2 I Got one on raidz2 whit 6 X2 2TB now
     
    Last edited: Feb 18, 2011
  2. sub.mesa

    sub.mesa 2[H]4U

    Messages:
    2,508
    Joined:
    Feb 16, 2010
    What OS are you running? How did you measure ZFS memory consumption?
     
  3. Gigas-VII

    Gigas-VII Limp Gawd

    Messages:
    240
    Joined:
    May 2, 2005
    EDIT: A note about ZFS in general:
    Keep in mind that ZFS was designed to be the ideal filesystem for the enterprise storage server, where cost is no object, performance can be scaled by adding additional hardware (regardless of cost), and the number 1 concern is the safety of the data itself. ZFS does not trust your hardware, and double-checks each operation using checksums, parity, and logging. All these aspects factor into the requirements for running ZFS. Also keep in mind that unless you are putting Enterprise-Grade loads on your storage server, you may not notice an immediate benefit to adding additional hardware.

    ZFS uses RAM in a few ways:

    1) Read cache. This function is known as ARC. Estimate this at about 1 GB per TB of zpool.
    2) Temporary Write cache. Incoming writes queue up here before being added to the ZFS Intent Log (ZIL), and then ultimately to the live filesystem. This is generally only 200-500 MB, but estimate at least 1 GB.
    3) Checksums: ZFS performs checksumming on all blocks of data stored within the zpool, and those checksums have to be calculated and verified for each read and write. These operations use considerable CPU and RAM. Difficult to estimate.
    4) Parity-Data: If your drives are in a RAID-Z configuration, then ZFS uses RAM during the calculation of Parity Data which allows the fault-tolerance offered by RAID-Z. Calculating parity data uses considerable CPU and RAM, and must be done in addition to the standard checksum operations. Difficult to estimate.

    Optional Functions: added in EDIT1[/]

    5) Deduplication Table: Logging a deduplication table increases the amount of calculation work that needs to be performed, which increases the amount of RAM utilization for those calculations. Additionally, ZFS will try to store the dedup-table in RAM as much as possible, and keep a cache of it on the disk as well. Deduplication hurts throughput considerably, but allows a capacity gain if many potentially redundant instances of the same data exist in the zpool: OS-filesystem backups, media collection backups, etc. Databases may or may not be impacted by deduplication.
    6) Compression: Like deduplication, compression also increases the calculation load, and thereby the CPU and RAM utilization. Once files are compressed and stored to the zpool, they consume no additional memory (unlike the dedup-table), until they are loaded and then must be decompressed, at which point the calculation load again increases, files inflate in memory, and RAM usage skyrockets. The default LZJB compression algorithm is quick and relatively painless, but if the GZip compression algorithm is used instead, the CPU and RAM utilization are significantly heavier.

    Additionally, the OS itself requires some RAM (I'd estimate 1 GB). ZFS will leave 1GB of RAM free for the OS automatically, but if additional services running on the OS require more RAM, they will steal it from ZFS, causing ZFS to swap the contents to disk.

    So if you have 6x2TB, and are using them in a single RAID-Z2 vdev, then your zpool is 12TB total (minus 4GB for parity data, leaivng 8GB usable), meaning it would be ideal to have at least 12 TB of RAM for the ARC. You did not mention using a dedicated ZIL device, so the ZIL will be interleaved with the rest of the storage, which puts some additional strain on the zpool, so I'd allocate at least 1 GB of RAM for write cache, maybe more. You also need some for the checksumming, parity, and OS...

    ZFS leaves 1GB of RAM available to the OS automatically. That means ZFS is assuming control of 11GB of your RAM right now, which isn't enough. You would stand to benefit from increasing your RAM at least a bit. 24GB wouldn't hurt you at all. ZFS will use all the RAM you give it, and manage that RAM allocation manually.

    EDIT2: A note on performance:
    The Write-IOPS that a single vdev can produce is limited to the IOPS of the single slowest drive in the vdev. This is because each member of a vdev must sign off on any changes made to any files on that vdev, and the operation is performed individually. If you want more performance, add additional vdevs to the zpool, rather than making a single larger vdev.

    For example: you have 6x2TB drives. in a single RAID-Z2 vdev, you have 12TB zpool, 8TB space. You could instead create 2 RAID-Z1 vdevs: 3x2TB each, leaving you the same 8TB of usable space out of your 12TB zpool, one parity disk per vdev. However, in this configuration, you would have 2x the IOPS, as IOPS are the aggregate of the single slowest drive from each vdev. If you wanted to retain RAID-Z2, you would need to add 2 additional 2TB drives, bringing you to 8x2TB, 16TB pool, 8TB space: 4x2TB per vdev, 2 parity instances per vdev, and 2x the IOPS of your original configuration. To maximize IOPS, you could skip RAID-Z entirely, and instead mirror your disks: 2x2TB mirrors per vdev, 3 vdevs, so 6TB and 3x the IOPS. No parity per-se, but still fault tolerant per vdev. The RAM requirement of this configuration would be significantly less as no parity data would be calculated. More importantly, you can always add additional vdevs to the zpool, but no vdev can be expanded. This means you could add additional storage space to your zpool by simply adding pairs of disks. During a data-reconstruction and resilvering, only the contents of the mirrored drive would have to be rewritten, rather than the entire zpool. This means a quicker recovery.
     
    Last edited: Feb 18, 2011
  4. hotzen

    hotzen Limp Gawd

    Messages:
    349
    Joined:
    Jan 29, 2011
    awesome post, thank you.
     
  5. tobiasl

    tobiasl [H]Lite

    Messages:
    100
    Joined:
    Nov 19, 2007
    Solaris 11 Express and the ram consumption are from performance mornitor


    and Gigas-VII
    thx it som good stuff to read to a new one to ZFS like me:D
    edit: I get a 6X 2TB raidz2 on monday to add as vdev
     
    Last edited: Feb 18, 2011
  6. Gigas-VII

    Gigas-VII Limp Gawd

    Messages:
    240
    Joined:
    May 2, 2005
    From what I understand, Solaris 11 Express currently has the best implementation of ZFS, followed by OpenSolaris/OpenIndiana, then FreeBSD. The benchmarks I've seen show BSD-ZFS at somewhere between 10% and 50% the performance of Solaris 11 Express. The biggest reason for this is: ZFS does not get along with UFS, Period. If possible, make sure you OS is running from a dedicated zpool (probably just a simple mirror vdev all by itself in a zpool). If you use UFS and ZFS, they will fight over resources and slow things to a crawl. It has to be one or the other. Currently, most BSD distributions use UFS for OS installation. Avoid this, and find an implementation (either as a guide or a special build) that installs BSD to a ZFS zpool. Then, once the OS is running properly, you can create your data zpool and populate it with your vdevs.

    Also, I'm happy to help :)
     
  7. tobiasl

    tobiasl [H]Lite

    Messages:
    100
    Joined:
    Nov 19, 2007
    system are just on a 500GB HDD now no mirror I are moving my server to a new 4u case on monday if I get a new 500GB more can I just make a mirrror or do I nede to make it from the start
     
  8. sub.mesa

    sub.mesa 2[H]4U

    Messages:
    2,508
    Joined:
    Feb 16, 2010
    @Gigas-VII: awesome post! But you should mention that most of that information applies to Solaris, not to BSD. ZFS memory allocation on BSD works differently and IMO less elegant. The benchmarks comparing BSD against Solaris that i've seen were not really good comparisons, since BSD was either not tuned at all or badly tuned, while Solaris by default has much better memory allocation than BSD. And comparing alpha-quality ZFS v6 against Solaris V22 implementation like some benchmarks did also does not portray a fair picture.

    Your comment about mixing ZFS and UFS is spot-on, though. The issue is that UFS claims memory and does not give it back to ZFS. In the top output you would see this under "Inact" while ZFS consumes most memory under "Wired".

    So if you go BSD you would want a 100% ZFS implementation. Both MfsBSD and ZFSguru allow for a Root-on-ZFS without UFS.
     
  9. DocSilly

    DocSilly n00b

    Messages:
    18
    Joined:
    Feb 18, 2011
    Here's another way to see how much RAM your ZFS uses on SE11 ... my performance monitor also shows only 650MB in use (4%) ;)

    docsilly@nas:~# nice -10 echo "::memstat"|mdb -k

    Page Summary Pages MB %Tot
    ------------ ---------------- ---------------- ----
    Kernel 556990 2175 13%
    ZFS File Data 2971153 11606 72%
    Anon 127070 496 3%
    Exec and libs 4278 16 0%
    Page cache 21697 84 1%
    Free (cachelist) 6859 26 0%
    Free (freelist) 438205 1711 11%

    Total 4126252 16118
    Physical 4126251 16118
     
  10. Flintstone

    Flintstone Limp Gawd

    Messages:
    130
    Joined:
    Nov 12, 2010
    IMHO don't be too caught up in the RAM requirements. It has a LOT of inpact on FreeBSD but it's no problem having great performance even with 4GB ram on Solaris

    Of course this all depends on usage of course, but in my usage - sequential transfers over gigabyte ethernet is the normal scenario - and with ~500MB/s sequential read/write on my array it hasn't been a big issue to say the least.
     
  11. moose517

    moose517 Gawd

    Messages:
    640
    Joined:
    Feb 28, 2009
    So if i use solaris express 11 with 8gb of RAM i will be fine? I read what Gigas-VIII said above and if i follow his advice i'm gonna need 24gb minimum, thats just insane.
     
  12. sub.mesa

    sub.mesa 2[H]4U

    Messages:
    2,508
    Joined:
    Feb 16, 2010
    24GiB for 200 euro is not that insane. ECC is more expensive though, but honestly now is the time to buy ALOT of memory! Prices are at a 5-year low and you should buy as much as you can right now because in a while you're pay double or triple for the same amount. At least, that is what usually happens with DRAM pricing, and right now it's available very cheap.
     
  13. olavgg

    olavgg Limp Gawd

    Messages:
    232
    Joined:
    Oct 27, 2010
    I use FreeBSD with 4GB ecc memory. I have no performance issues, even when all ZFS drives are encrypted with GELI. The OS files are stored on UFS. It would probably be nice with more memory if I had over 10 iSCSI clients hammering the zpool volume.
     
  14. Red Falcon

    Red Falcon [H]ardForum Junkie

    Messages:
    9,777
    Joined:
    May 7, 2007
    I've had this happen to me and never understood what the problem was until your post, which makes perfect sense. Thanks for posting this, I'm going to try something new now! :D
     
  15. Gigas-VII

    Gigas-VII Limp Gawd

    Messages:
    240
    Joined:
    May 2, 2005
    Just keep in mind that the recommendations I made are for the intended purpose: Enterprise-Grade Storage. If you're doing this at home, where you'll be hitting the zpool from only 1-5 clients at a time, hosting few (if any) databases, and mostly streaming large blocks of content, then you won't need anywhere near the equipment I suggested. I was explaining the recommended guidelines to optimize performance without completely blowing the bank.