ZFS Checksum errors

ldoodle · Dec 14, 2011

Hiya,

Noticed that the last couple of weekly scrubs on my data pool (3x 5 drive raid-z2) has bee taking a lot longer than usual - normally around 45 minutes for 1.5t worth of stored data, but now half a day!

Then the most recent one (run's every Monday morning) has shown this;

*****************************************************************************************
pool: dpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scan: scrub repaired 5.12M in 21h3m with 0 errors on Mon Dec 12 23:03:26 2011
config:

NAME STATE READ WRITE CKSUM
dpool ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c9t7d0 ONLINE 0 0 0
c9t8d0 ONLINE 0 0 0
c9t9d0 ONLINE 0 0 0
c9t10d0 ONLINE 0 0 0
c9t11d0 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
c9t12d0 ONLINE 0 0 0
c9t13d0 ONLINE 0 0 0
c9t14d0 ONLINE 0 0 0
c10t8d0 ONLINE 0 0 0
c10t9d0 ONLINE 0 0 15
raidz2-2 ONLINE 0 0 0
c10t10d0 ONLINE 0 0 0
c10t11d0 ONLINE 0 0 0
c10t12d0 ONLINE 0 0 0
c10t13d0 ONLINE 0 0 0
c10t14d0 ONLINE 0 0 0

errors: No known data errors
*****************************************************************************************

Does this actually mean hardware failure. If so the drives are only 6 months old so would be replaced under warranty.

Thanks

danswartz · Dec 14, 2011

I would try clearing the error and doing a rescrub. If the problem comes back, replace the drive. Alternatively, you could pull the drive and plug it into another system and try doing a low-level scan?

ldoodle · Dec 14, 2011

Weird. Done just that and it ran much quicker with zero checksum errors.

mwroobel · Dec 14, 2011

Well, they are checksum errors and not hard read or write errors. What do you get from an iostat? Is this the first error you have received?
iostat -exmn
iostat -xhmN

bexamous · Dec 14, 2011

I'd toss that drive. I've run into this a few times on at work and at home on Linux systems with md arrays... basically some array will end up being slow for no reason but not getting any errors. When looking into it further it usually ends up being a single drive that is taking forever to respond to requests. What I do to track it down is run: 'dd if=/dev/md0 of=/dev/null bs=1M' to just do some sequential reads... I then run 'iostat -x 5' and look at await (which I think is the avg time it takes the drive to finish requests? too lazy to look it up right now). Anyways, on the messed up arrays I'll usually see all but one drive averaging 10-20ms wait, and then a single drive is sitting at 200+ms avg wait.

Every time this has happened I've not seen any actual errors, just one drive being slow for no reason. I just replace the drives because something must be messed up.

ldoodle · Dec 15, 2011

mwroobel said:
What do you get from an iostat? Is this the first error you have received?
iostat -exmn

extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.7 0.0 37.8 0.3 0.0 0.0 11.3 1.6 0 0 0 0 0 0 c8t0d0
0.7 0.0 37.8 0.3 0.0 0.0 11.6 1.7 0 0 0 0 0 0 c8t1d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t8d0
4.7 0.0 344.2 0.1 0.0 0.4 0.3 78.2 0 4 0 81 238 319 c10t9d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t10d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t11d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t12d0
4.6 0.0 344.2 0.1 0.0 0.0 0.0 5.6 0 0 0 0 0 0 c10t13d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t14d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t7d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t8d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t9d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t10d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t11d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t12d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t13d0
4.9 0.0 344.2 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t14d0

Yes, first time I've received errors.

Is it possible to view a complete history of scrubs? I ask as I'm thinking that my original statement than scrubs on this pool were only taking 40 minutes might not be right, so i'd like to check.

ldoodle · Dec 15, 2011

iostat -exmn during a scrub show this for the problem drive;

extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
2.8 0.0 239.9 0.0 0.0 10.0 0.0 3571.3 0 100 0 81 238 319 c10t9d0

mwroobel · Dec 15, 2011

ldoodle said:
iostat -exmn during a scrub show this for the problem drive;

extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
2.8 0.0 239.9 0.0 0.0 10.0 0.0 3571.3 0 100 0 81 238 319 c10t9d0

Dump that drive. If it is under warranty RMA it otherwise get it out of the pool. The 81 is the Hard disk error (as opposed to a soft disk error which the system can recover from by re-reading the sector). It is a disk sector that continues to fail its CRC after being reread several times (usually 15) and cannot be read period. The 238 is a transport error which is reported by the SATA bus. The 319 is just a total of the errors.

ldoodle · Dec 15, 2011

Thanks mwroobel. Any tips as to what I tell Hitachi to speed up the process.

Can you remind me of the process to replace the disk as well. Is it simply detatch the dead one and plug in the new one (I have hot-swap caddies). As I don't have any spare slots I can't plug in a new one, use the replace command and remove the old one.

mwroobel · Dec 15, 2011

If you do the online rma form they shouldn't ask you anything. If for whatever reason they pick out your request for "further study" just tell them the drive doesn't power up and save yourself the time they will make you jump through hoops with diags (at least the 3tb drives don't work with their DFT so they can't make you do that for rma.)

brutalizer · Dec 16, 2011

mwroobel said:
Dump that drive. If it is under warranty RMA it otherwise get it out of the pool. The 81 is the Hard disk error (as opposed to a soft disk error which the system can recover from by re-reading the sector). It is a disk sector that continues to fail its CRC after being reread several times (usually 15) and cannot be read period. The 238 is a transport error which is reported by the SATA bus. The 319 is just a total of the errors.

Wow, how could you interpret those numbers? Do you have a table? where to find that information?

@Idoodle,
What does SMARTs say? Did SMART detect those problems? ZFS detected these problems, obviously, but what about SMART?

I am trying to find real life cases where ZFS detects problems, and SMART does not. And vice versa.

mwroobel · Dec 16, 2011

brutalizer said:
Wow, how could you interpret those numbers? Do you have a table? where to find that information?

@Idoodle,
What does SMARTs say? Did SMART detect those problems? ZFS detected these problems, obviously, but what about SMART?

I am trying to find real life cases where ZFS detects problems, and SMART does not. And vice versa.

brutal-

SMART is just a list of values and thresholds. You can have a disk error that manifests itself and for whatever reason is still within the threshold of what the drive allows as "normal". As to interpreting the numbers, just do a google search for iostat and you can read through all the options. I just know them because I use the tool a lot.

ldoodle · Dec 18, 2011

Just an update on this. After another scrub, another drive seemingly has errors as well;

extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.7 0.0 35.0 0.3 0.0 0.0 11.3 1.6 0 0 0 0 0 0 c8t0d0
0.7 0.0 35.0 0.3 0.0 0.0 11.6 1.7 0 0 0 0 0 0 c8t1d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t8d0
4.9 0.0 364.0 0.1 0.0 0.4 0.3 79.4 0 4 0 81 238 319 c10t9d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t10d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t11d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t12d0
4.9 0.0 364.0 0.1 0.0 0.0 0.0 6.5 0 1 0 1 4 5 c10t13d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t14d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t7d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t8d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t9d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t10d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t11d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t12d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t13d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t14d0

Thankfully they're not in the same raid-z2 set, but is this a sign of something other than the drives. Or have I just been unlucky with them, asd I guess in 15 consumer drives 2 failures isn't that bad?

brutalizer · Dec 18, 2011

A Guy had a flaky power supply, giving the same symptom. Google this to learn more:

Zfs saves the day

mwroobel · Dec 18, 2011

ldoodle said:
Just an update on this. After another scrub, another drive seemingly has errors as well;

extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.7 0.0 35.0 0.3 0.0 0.0 11.3 1.6 0 0 0 0 0 0 c8t0d0
0.7 0.0 35.0 0.3 0.0 0.0 11.6 1.7 0 0 0 0 0 0 c8t1d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t8d0
4.9 0.0 364.0 0.1 0.0 0.4 0.3 79.4 0 4 0 81 238 319 c10t9d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t10d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t11d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t12d0
4.9 0.0 364.0 0.1 0.0 0.0 0.0 6.5 0 1 0 1 4 5 c10t13d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c10t14d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t7d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t8d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t9d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t10d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t11d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t12d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t13d0
5.2 0.0 364.0 0.1 0.0 0.0 0.0 1.0 0 0 0 0 0 0 c9t14d0

Thankfully they're not in the same raid-z2 set, but is this a sign of something other than the drives. Or have I just been unlucky with them, asd I guess in 15 consumer drives 2 failures isn't that bad?

Well, it is always possible you have something else going on here, such as bad power or bad cables, but it is just as possible that it is 2 bad drives. Are the drives the same brand/model and were they purchased about the same time? If so, how close are the serial numbers? As always, I would make sure I have good backups in place JUST IN CASE. I would also be interested in seeing the SMART numbers on both drives, most specifically Pending Sector Count, Unrecoverable Error Count, Spin Retry Count and Seek Erro Rate.

ldoodle · Dec 18, 2011

brutalizer said:
A Guy had a flaky power supply, giving the same symptom. Google this to learn more:

Surely then it would affect all the drives? Or at least all the drives running off the same 2 molex plugs (I have 3x 5 drive caddies which use 2 molex's each).

ldoodle · Dec 18, 2011

mwroobel said:
Are the drives the same brand/model and were they purchased about the same time? If so, how close are the serial numbers? As always, I would make sure I have good backups in place JUST IN CASE. I would also be interested in seeing the SMART numbers on both drives, most specifically Pending Sector Count, Unrecoverable Error Count, Spin Retry Count and Seek Erro Rate.

All drives exactly the same and bought all at the same time (Hitachi 5K3000s).

I am going to do a backup to USB tomorrow overnight, and may even drop to 2x 6 drive raid-z2s (12 drives total) rather than 3x 5 drive raid-z2s (total 15 drives) so I can use the leftovers as spares (hot or cold).

How do I get SMART values? Also, I have these plugged into LSI SAS1068 controllers and for some reason it doesn't show the serial numbers in ZFS (but it does for the 2 system drives plugged directly into the motherboard.

Thanks mwroobel, reall appreciate your help.

mwroobel · Dec 18, 2011

ldoodle said:
All drives exactly the same and bought all at the same time (Hitachi 5K3000s).

I am going to do a backup to USB tomorrow overnight, and may even drop to 2x 6 drive raid-z2s (12 drives total) rather than 3x 5 drive raid-z2s (total 15 drives) so I can use the leftovers as spares (hot or cold).

How do I get SMART values? Also, I have these plugged into LSI SAS1068 controllers and for some reason it doesn't show the serial numbers in ZFS (but it does for the 2 system drives plugged directly into the motherboard.

Thanks mwroobel, reall appreciate your help.

SMART and serial number info can be passed or blocked by a particular controller type (or not). For example in windows, only HD Sentinel shows SMART info passed from many Hardware RAID cards, where most other utils won't see it. Do a smartctl -a device or smartctl -x device (for you probably smartctl -x /dev/dsk/c10t9d0 for example). For whatever reason, over time I have seen drives that are from the same production run die in groups, with close serial number sequences. When we order a lot of drives, we will usually order 6-9 at a time from different vendors so we have a better chance of getting a wider variety, and will usually move them to different arrays so we have less chance if this particular problem.

mwroobel · Dec 18, 2011

ldoodle said:
Surely then it would affect all the drives? Or at least all the drives running off the same 2 molex plugs (I have 3x 5 drive caddies which use 2 molex's each).

Not necessarily. Depending on the particular model power supply, there may be as few as 1 or as many as 4 different 12V power rails, and I have seen power supplies where just 1 of the rails has gone bad. As to the molex plugs on a particular wire chain, it is very easy for one (or more) of the pins to migrate out of the connector enough to lose contact on just one and not another.

ldoodle · Dec 19, 2011

mwroobel said:
When we order a lot of drives, we will usually order 6-9 at a time from different vendors so we have a better chance of getting a wider variety, and will usually move them to different arrays so we have less chance if this particular problem.

Yeah so do I for enterprise systems, but as this is a home file server I wasn't overly worried about the bulk order, especially the prices I got them at for getting so many!

I don't have the smartmontools installed so can you help me get them.

Oh, my weekly scrub has shown more errors on that drive (risen to 171 h/w errors!) I am going to backup and destroy everything, then re-install with full Solaris 11 and 2x 6 drive raid-z2's with 1 spare for each set. Then keep the 3rd leftover drive as a single disk outside the pool to copy my critical data to (anything I can't easily re-create).

brutalizer · Dec 19, 2011

ldoodle said:
Surely then it would affect all the drives? Or at least all the drives running off the same 2 molex plugs (I have 3x 5 drive caddies which use 2 molex's each).

All disks connected to the power supply are affected. Here is the link:
http://blogs.oracle.com/elowe/entry/zfs_saves_the_day_ta

ldoodle · Dec 19, 2011

brutalizer said:
All disks connected to the power supply are affected. Here is the link:
http://blogs.oracle.com/elowe/entry/zfs_saves_the_day_ta

I guess it's not PSU related for me then, as the 2 drives (so far!) are on separate 12v rails (each 5 drive caddy is connected to it's own 12v rail and these drives are in different caddies)

mwroobel · Dec 19, 2011

ldoodle said:
I guess it's not PSU related for me then, as the 2 drives (so far!) are on separate 12v rails (each 5 drive caddy is connected to it's own 12v rail and these drives are in different caddies)

Depending on the power supply, just in case I didn't give enough info a rail doesn't (necessarily) mean a particular cable coming off the power supply. For example, the 24 pin could be one rail, the PCIe video cables (up to 4) could be a separate rail, all the 4 pin molex cables could be a third and all the SATA a fourth. Or.... Any combination of the above. It depends on the particular model (or in a few cases the specific revision of that model) of power supply.

brutalizer · Dec 19, 2011

If the power supply is buggy, then everything connected to it will have problems, yes? Disks, graphics card, etc etc

ldoodle · Dec 19, 2011

mwroobel said:
Depending on the power supply, just in case I didn't give enough info a rail doesn't (necessarily) mean a particular cable coming off the power supply. For example, the 24 pin could be one rail, the PCIe video cables (up to 4) could be a separate rail, all the 4 pin molex cables could be a third and all the SATA a fourth. Or.... Any combination of the above. It depends on the particular model (or in a few cases the specific revision of that model) of power supply.

Duh, I knew that! Was thinking for a moment there that each output was a rail. It's a Corsair HX 520W PSU.

ldoodle · Dec 19, 2011

Any chance of getting help getting smartmontools installed so I can give values.

I've seen this http://www.opencsw.org/packages/smartmontools/, but don't know what to do.

mwroobel · Dec 19, 2011

ldoodle said:
Any chance of getting help getting smartmontools installed so I can give values.

I've seen this http://www.opencsw.org/packages/smartmontools/, but don't know what to do.

Well, you should just need to download and extract those files to a directory, and then do:

./configure
make
su make install or su pfexec make install

Have you ever installed packages or compiled from source before? If not, you might not have the proper gcc-dev packages installed. If not, you need to do first:

pfexec pkg install gcc-dev

ldoodle · Dec 20, 2011

Weird one here. I've copied all my photos, music, documents etc just fine. Trying to do my movie rips (VIDEO_TS folders with VOB files), but it's failing on every VOB file. I can copy smaller files from the same folder as the VOB files just fine.

Might sound stupid but does ZFS do anything to certain file types? Excluded VOB files from my local virsu scanner with no difference. To rule out network, can I plug the USB drive into the Solaris box and do a copy/paste that way?

jonnyjl · Dec 20, 2011

ldoodle said:
Weird one here. I've copied all my photos, music, documents etc just fine. Trying to do my movie rips (VIDEO_TS folders with VOB files), but it's failing on every VOB file. I can copy smaller files from the same folder as the VOB files just fine.

Might sound stupid but does ZFS do anything to certain file types? Excluded VOB files from my local virsu scanner with no difference. To rule out network, can I plug the USB drive into the Solaris box and do a copy/paste that way?

Any errors showing up in dmesg?

Are you copying to a CIFS/SMB Share?

You're using Solaris 11?

ldoodle · Dec 21, 2011

It's the flagging disks. I offlined both of them from their pools and things are better now.

What a great system ZFS really is. You'd be stuffed in the position with hardware RAID.

ZFS Checksum errors

Limp Gawd

2[H]4U

Limp Gawd

Supreme [H]ardness

[H]ard|Gawd

Limp Gawd

Limp Gawd

Supreme [H]ardness

Limp Gawd

Supreme [H]ardness

[H]ard|Gawd

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

Supreme [H]ardness

Limp Gawd

Limp Gawd

Supreme [H]ardness

Supreme [H]ardness

Limp Gawd

[H]ard|Gawd

Limp Gawd

Supreme [H]ardness

[H]ard|Gawd

Limp Gawd

Limp Gawd

Supreme [H]ardness

Limp Gawd

Limp Gawd

Limp Gawd