ZFS Very slow scrub performance

Mowge

n00b
Joined
Jan 18, 2013
Messages
39
I'm having some really shitty scrub performance with a new server I just finished setting up.
Earlier when there was less data on the pool I saw scrub speed well over 450MB/s.

Currently there is about 5.5TB on it and scrub is running at around 4MB/s. Its been running for about 7,5h now and it has only been through 110GB. Is this normal? I have all kinds of stuff on there; VDI's, movies, ISO's, pictures etc etc.

Code:
pool: tank
 state: ONLINE
  scan: scrub in progress since Mon Oct  7 02:00:02 2013
    109G scanned out of 5.45T at 4.08M/s, 381h43m to go
    0 repaired, 1.95% done
config:

        NAME                       STATE     READ WRITE CKSUM
        tank                       ONLINE       0     0     0
          raidz1-0                 ONLINE       0     0     0
            c3t50014EE208D2383Dd0  ONLINE       0     0     0
            c3t50014EE25E276286d0  ONLINE       0     0     0
            c3t50014EE25E277333d0  ONLINE       0     0     0
            c3t50014EE25E2773DAd0  ONLINE       0     0     0
            c3t50014EE2B37CFCD3d0  ONLINE       0     0     0
            c3t50014EE2B37D059Cd0  ONLINE       0     0     0
          raidz1-1                 ONLINE       0     0     0
            c3t50014EE003865735d0  ONLINE       0     0     0
            c3t50014EE058CF9F78d0  ONLINE       0     0     0
            c3t50014EE058CFCE8Fd0  ONLINE       0     0     0
            c3t50014EE0AE30F841d0  ONLINE       0     0     0
            c3t50014EE2B37CF274d0  ONLINE       0     0     0
            c3t50014EE2B37D057Bd0  ONLINE       0     0     0

errors: No known data errors
 
Do you use Bittorrent and write the torrents directly to the pool? Any application with a random block-based write access scheme will massively fragment a ZFS filesystem, leading to very slow scrub speeds (and sequential read speeds for those files in general). This is even amplified by frequent snapshotting, as the fs cannot free the old blocks. What you can do against this is basically only to rewrite those files (copy file and delete old one), but this will disconnect these file from their snapshots and double their space usage if you keep the snapshots. Another way would be to transfer the dataset to another pool and back (or just duplicate - not clone - it on the same pool), but if you also transfer the snapshots in that process it will fragment the files again - not as bad, though.

What is the fill level of your pool?
 
Last edited:
slow scrub speeds aren't cause of random writes.

Slow speed is affected by block size, if you have lots of small blocks written, scrubs will be slower than if you used large blocks, expecially over your raidz.

I have 300mb of files that takes days to scrub, cause the avg filesize of them is 1k.

The other 4tb scrubs in only a matter of hours, but the avg there is around 56k per block.
 
Do you use Bittorrent and write the torrents directly to the pool? Any application with a random block-based write access scheme will massively fragment a ZFS filesystem, leading to very slow scrub speeds (and sequential read speeds for those files in general). This is even amplified by frequent snapshotting, as the fs cannot free the old blocks. What you can do against this is basically only to rewrite those files (copy file and delete old one), but this will disconnect these file from their snapshots and double their space usage if you keep the snapshots. Another way would be to transfer the dataset to another pool and back (or just duplicate - not clone - it on the same pool), but if you also transfer the snapshots in that process it will fragment the files again - not as bad, though.

What is the fill level of your pool?

I did download a few GB's (maybe 60 or so) of torrents. Could that really be enough to fragment the pool this bad? Also there are a few snaps, but not of the torrents I'm pretty sure.

Pool is about 30% filled.

I will try to remove the torrents of the pool and delete the snapshots, thanks :)

Also, would a ZIL or L2ARC speed up scrubs (in general)?
 
slow scrub speeds aren't cause of random writes.

Slow speed is affected by block size, if you have lots of small blocks written, scrubs will be slower than if you used large blocks, expecially over your raidz.

I have 300mb of files that takes days to scrub, cause the avg filesize of them is 1k.

The other 4tb scrubs in only a matter of hours, but the avg there is around 56k per block.

Hmm, alright. I'll wait and see if speeds pick up.
 
if this were my pool I would seriously take a look at each drive's latency. It sounds like one or more drives may be going bad and lagging the pool. all it takes is one drive with seek errors to cripple the performance of a raid
 
if this were my pool I would seriously take a look at each drive's latency. It sounds like one or more drives may be going bad and lagging the pool. all it takes is one drive with seek errors to cripple the performance of a raid

Just curious, whats the command to do this?
 
Just curious, whats the command to do this?

I like "iostat -exn" for that. Look for high wsvc_t and high asvc_t columns or any drive that is generally different from the others.port

Example here from my NAS on which a drive took a dump the other day. It had a wsvc_t of 272 as well but that went away when I rebooted and removed that pool.


Code:
                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
    0.1    7.7    1.5   64.1  0.0  0.0    1.4    0.3   0   0   0   0   0   0 c5t0d0
    0.0    0.0    0.0    0.5  0.0  0.0    0.0    2.9   0   0   0   0   0   0 c2t5001517BB2762653d0
   12.6    4.0  510.7   65.9  0.0  0.1    0.0    5.7   0   2   0   0   0   0 c2t5000C5000DCDAC10d0
    0.0    0.1    4.0   12.0  0.0  0.0    0.0    2.6   0   0   0   0   0   0 c3t1d0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0  172.0   0   0   0  10   1  11 c2t50014EE25B7D1C86d0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    5.4   0   0   0   0   1   1 c2t50014EE2B0D4A6D5d0
   12.6    4.0  510.7   65.9  0.0  0.1    0.0    5.8   0   2   0   0   0   0 c2t5000C5000E2A0CFDd0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    5.3   0   0   0   0   1   1 c2t50014EE2B0C754E9d0
   12.8    4.0  510.5   65.9  0.0  0.1    0.0    5.5   0   2   0   0   0   0 c2t5000C5000DBCD280d0
    0.1   15.2    9.5  136.0  0.0  0.0    0.0    2.0   0   1   0   0   0   0 c2t50014EE000DB6C3Fd0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    5.2   0   0   0   0   1   1 c2t50014EE206274347d0
   12.2    4.0  461.7   65.9  0.0  0.1    0.0    5.4   0   2   0   0   0   0 c2t5000C5000DB060D5d0
    0.1   15.2    9.3  136.0  0.0  0.0    0.0    2.0   0   1   0   0   0   0 c2t50014EE000D35503d0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    5.0   0   0   0   0   1   1 c2t50014EE2B0D21B17d0
    0.1   15.3    9.5  133.9  0.0  0.0    0.0    2.1   0   1   0   0   0   0 c2t50014EE056293C88d0
    8.6    4.5  506.3   65.9  0.0  0.1    0.0    9.6   0   2   0   0   0   0 c2t5000C5000274187Bd0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    4.4   0   0   0   0   1   1 c2t50014EE2B0DD0CC2d0
    8.3    4.5  476.1   65.9  0.0  0.1    0.0    9.2   0   2   0   0   0   0 c2t5000C50002741881d0
    0.1   15.3    9.6  133.9  0.0  0.0    0.0    2.1   0   1   0   0   0   0 c2t50014EE05630BCD9d0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    5.1   0   0   0   0   1   1 c2t50014EE2B0D490C9d0
    0.0    0.0    0.0    0.1  0.0  0.0    0.0    4.4   0   0   0   0   1   1 c2t50014EE25B7EDFA6d0
    0.1    7.7    1.6   64.1  0.0  0.0    1.4    0.3   0   0   0   0   0   0 c5t1d0
    0.0   23.9    0.0 2897.9  0.0  0.2    0.2    7.5   0   2   0   0   0   0 c5t5d0
    0.0    4.1    0.0   22.1  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c2t50015179591FD855d0
    0.0    4.1    0.0   22.1  0.0  0.0    0.0    0.1   0   0   0   0   0   0 c2t5001517959193BE0d0
   13.7    4.5  506.7   65.9  0.0  0.1    0.0    6.0   0   2   0   0   0   0 c2t50014EE2AE1FA98Cd0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c2t5000C5004FD25A4Dd0
 
Last edited:
if this were my pool I would seriously take a look at each drive's latency. It sounds like one or more drives may be going bad and lagging the pool. all it takes is one drive with seek errors to cripple the performance of a raid

Looks alright, eh?

Code:
                            extended device statistics       ---- errors ---
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 fd0
    1.0   12.9   70.7   69.3  0.0  0.0    0.6    0.5   0   0   0   0   0   0 rpool
    1.1   16.0   70.8   69.3  0.0  0.0    0.0    0.4   0   0   0   0   0   0 c2t0d0
    0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   5   0   5 c1t0d0
   16.6   43.9  523.9 1168.8  0.0  0.3    0.0    4.7   0  10   2   0   0   2 c3t50014EE25E277333d0
   16.7   43.9  524.4 1168.8  0.0  0.3    0.0    4.8   0  10   2   0   0   2 c3t50014EE2B37CFCD3d0
   16.7   45.0  525.5 1162.5  0.0  0.3    0.0    5.1   0  11   2   0   0   2 c3t50014EE2B37CF274d0
   16.5   43.2  499.8 1138.8  0.0  0.3    0.0    4.8   0  10   2   0   0   2 c3t50014EE25E276286d0
   16.5   43.1  499.4 1138.8  0.0  0.3    0.0    4.7   0  10   2   0   0   2 c3t50014EE25E2773DAd0
   16.6   44.3  500.8 1130.1  0.0  0.3    0.0    5.2   0  11   2   0   0   2 c3t50014EE2B37D057Bd0
   16.6   43.2  500.0 1138.8  0.0  0.3    0.0    4.7   0  10   3   0   0   3 c3t50014EE2B37D059Cd0
   16.7   43.9  524.4 1168.8  0.0  0.3    0.0    4.7   0  10   2   0   0   2 c3t50014EE208D2383Dd0
   16.5   44.3  500.4 1130.0  0.0  0.3    0.0    5.1   0  11   2   0   0   2 c3t50014EE0AE30F841d0
   16.5   44.3  500.6 1130.0  0.0  0.3    0.0    5.1   0  11   2   0   0   2 c3t50014EE058CF9F78d0
   16.6   45.0  525.0 1162.5  0.0  0.3    0.0    5.1   0  11   2   0   0   2 c3t50014EE058CFCE8Fd0
   16.6   45.0  525.3 1162.5  0.0  0.3    0.0    5.1   0  11   2   0   0   2 c3t50014EE003865735d0
  193.4  396.6 6128.6 13798.7 83.7  3.6  141.9    6.1   5  28   0   0   0   0 tank
 
if I were op I'd setup smartmontools. MANY of my slowness problems have been solved by replacing marginal drives.
 
you can adjust the scrub and resilver priorities. I forget the commands for scrub ... sorry.
 
I have striped mirror of 4 toshibas 3tb 7200. When i moved my stuff (backup pics/docs/etc..) to it, everything was ok. Now i have pretty busy torrents running on it aswell and the scrub speed went from +300mb/s to 25. And downloaded files aren't small, from 15mb to couple gigs. So i still think its because of heavy random writes. Whats the best practice for using zfs for torrents?
 
I have a separate drive for torrents still downloading and move the files to the main pool once finished. If you don't use snapshops you can just duplicate the files and delete the originals. The first option is still the better one as the main pool does not get fragmented in the first place.
 
If possible, I'd run the scrubs during periods you can shut down the torrents. I think this is largely unavoidable - the random reads and writes get in each other's way.
 
The writes from torrents itself, isn't going cause slowness, unless this is on a zvol.

The slowness will come from using the pool, while doing the scrub, as the scrub will go slower to give torrent more i/o capacity.

The fragementation issue from torrents though, is another issue, and one you probably want to solve.
 
Not sure I totally buy this. If you have a constant stream of writes to the pool from N torrents, it does have to commit those txgs every few seconds, and those seeks will collide with the reads the scrub is doing, no?
 
I don't know all the technical ins and outs of zfs, but simple logic (which obviously doesn't always apply to computer things), says that any regular drive activity is going to make the scrub go slower due to the head moving latency.
 
a write batch every 5 seconds isn't going cause a noticeable speed reduction.
The reads happening all the time, will though.

The default is something like, if no transactions in 50ms, so scrub work, as writes are batched, it's going have limited impact, the random reading impact over the time will cause huge effects though.
 
I have found what "caused the problem"! It was the scrub priority setting. After setting it to...

Code:
set zfs:zfs_scrub_delay = 0

... It runs fast as heck! It completed 1TB in just 30min.


And while I'm at it, I got a question regarding allocated space on my pool. It shows that there is 5.4TB allocated, but if i run...

Code:
zfs list -r tank

...it shows 4.3TB used. Can someone explain why? :)
 
Last edited:
'zpool list' prints the total amount of space used/free on all disks, without accounting for parity. 'zfs list', on the contrary, does this.
 
Back
Top