Somehow I'm experiencing what I'm going to call ZFS Rot...by this I do NOT mean (bit rot) but rather gradual deterioration of read and write performance across a pool that has been around for many years, has lots of data and lots of snapshots.
When I first built this array, was getting 800MB/s write and nearly 1000MB/s seq read from it. Now, three years later, I'm lucky if I can get 100MB/s read or write.
System is:
* Supermicro X8DTL
* Norco 4020 enclosure
* 9 x Hitachi 2TB 3Gbps
* HP SAS Expander
* LSI 9200-8e
* OI 151a4
For the longest time I was sure that I was having some kind of hardware problem (CPU, power management, SAS, disks, etc.) , however, when I do DD tests directly to the /dev/rdsk for each physical disk, I get 130MB/s per drive and 800MB/s when doing multiple DDs in parallel.
The real eye opener was when I added 6 Hitachi 4TB drives to the enclosure, connected them up to the exact same SAS expander and created a brand new pool. I run the same exact Filebench seq read test (single threaded even)...now I get 800MB/s when the test file is being created and get 900MB/s read!
This validates to me that the hardware is fine and my ZFS pool has somehow degraded.
I even did a ZFS send of one of the smaller volumes from the existing pool to the new test pool and then run the filebench test on the transfered pool..runs perfectly.
Now, the slow pool does have 12TB of 14TB used, it has 96 zfs volumes and 705 snapshots.
I need to get this pool back up to a more acceptable level of performance, but at this point I'm not exactly sure what attributes of the pool are causing such an extreme degradation in performance.
For example, I have 5TB chunk of data is could move to the new pool..should I expect that to then magically make the original pool perform fine (somehow I think not).
It was my understanding that having dozens of volumes within a pool and even thousands of snapshots was not a big deal..am I wrong? Perhap I need to pare these down?
Or is the pool somehow 'fragmented' and needs to be recreated from scratch?
I've been battling the performance on this server for about 8 months now, and I see now why I was so frustrated..I had been making some assumptions about ZFS being able to deal with whatever I threw at it 'automagically'...I still thing ZFS is fantastic, but I'm pretty sure there are some long-term best practices you need to follow in order to keep your pools 'healthy'...unfortunately the details on what those are seem to be undocumented.
Thoughts?
When I first built this array, was getting 800MB/s write and nearly 1000MB/s seq read from it. Now, three years later, I'm lucky if I can get 100MB/s read or write.
System is:
* Supermicro X8DTL
* Norco 4020 enclosure
* 9 x Hitachi 2TB 3Gbps
* HP SAS Expander
* LSI 9200-8e
* OI 151a4
For the longest time I was sure that I was having some kind of hardware problem (CPU, power management, SAS, disks, etc.) , however, when I do DD tests directly to the /dev/rdsk for each physical disk, I get 130MB/s per drive and 800MB/s when doing multiple DDs in parallel.
The real eye opener was when I added 6 Hitachi 4TB drives to the enclosure, connected them up to the exact same SAS expander and created a brand new pool. I run the same exact Filebench seq read test (single threaded even)...now I get 800MB/s when the test file is being created and get 900MB/s read!
This validates to me that the hardware is fine and my ZFS pool has somehow degraded.
I even did a ZFS send of one of the smaller volumes from the existing pool to the new test pool and then run the filebench test on the transfered pool..runs perfectly.
Now, the slow pool does have 12TB of 14TB used, it has 96 zfs volumes and 705 snapshots.
I need to get this pool back up to a more acceptable level of performance, but at this point I'm not exactly sure what attributes of the pool are causing such an extreme degradation in performance.
For example, I have 5TB chunk of data is could move to the new pool..should I expect that to then magically make the original pool perform fine (somehow I think not).
It was my understanding that having dozens of volumes within a pool and even thousands of snapshots was not a big deal..am I wrong? Perhap I need to pare these down?
Or is the pool somehow 'fragmented' and needs to be recreated from scratch?
I've been battling the performance on this server for about 8 months now, and I see now why I was so frustrated..I had been making some assumptions about ZFS being able to deal with whatever I threw at it 'automagically'...I still thing ZFS is fantastic, but I'm pretty sure there are some long-term best practices you need to follow in order to keep your pools 'healthy'...unfortunately the details on what those are seem to be undocumented.
Thoughts?