ZFS scrub results question

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Hey all;

I've searched on this but can't find anything specific to this question. I finished converting our home/office server over to ZFS this week and am very happy so far; I finished the first scrub last night and got the following:

ool: tycho
state: ONLINE
scan: scrub repaired 992K in 4h39m with 0 errors on Wed Dec 14 01:56:19 2011
config:

NAME STATE READ WRITE CKSUM
tycho ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0
c0t1d0 ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0

errors: No known data errors


The bit I'm wondering about is the 992K repaired - what exactly is this referring to? Is this a cause for concern?
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
There may have been checksum errors? Or did you insert/replace any drives?

If there were checksum errors would they not appear in the listing below? Or am I misreading the output?

No drives were added or removed to the pool since it was created. What output would you expect to see if one had been added/removed?
 

danswartz

2[H]4U
Joined
Feb 25, 2011
Messages
3,703
Yeah, good point. Errors should have been caught. No idea offhand. Can't find definitive statements, but hints that the read/write/cksum are bumped during normal operations, whereas scrub can find errors (bit rot, sectors gone bad?) that do not show there. Be nice to know for sure. Does anything show in SMART?
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Yeah, good point. Errors should have been caught. No idea offhand. Can't find definitive statements, but hints that the read/write/cksum are bumped during normal operations, whereas scrub can find errors (bit rot, sectors gone bad?) that do not show there. Be nice to know for sure. Does anything show in SMART?


SMART all appears A-OK - I'm at a bit of a loss here. I can't find any hard information on how to interpret the results - none of the manual pages or wikis I can find lists anything that's particularly relevant :(
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Hm, re-ran the scrub and keeping an eye on it; I can see that it's repairing something already after only a few minutes:


pool: tycho
state: ONLINE
scan: scrub in progress since Wed Dec 14 19:12:54 2011
195G scanned out of 4.21T at 243M/s, 4h48m to go
256K repaired, 4.51% done
config:

NAME STATE READ WRITE CKSUM
tycho ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c0t0d0 ONLINE 0 0 0 (repairing)
c0t1d0 ONLINE 0 0 0
c0t2d0 ONLINE 0 0 0
c0t3d0 ONLINE 0 0 0
c0t4d0 ONLINE 0 0 0
c0t5d0 ONLINE 0 0 0

errors: No known data errors
 

Argentum

Weaksauce
Joined
Nov 14, 2006
Messages
89
I honestly don't know for sure either, but I see the same thing from time to time. I have one pool that's a uniform collection of Hitachi disks off an internal LSI2008 baed controller that never has errors, and a pool that's a mismash of WD (some 512, some 4K) & Seagate 2TB drives in an external case attached to a SAS expander by way of an LSI1068 based controller with external cabling, and go figure, it almost always has some small quantity of data that gets repaired with no device level errors (running this scrub right now):

:~# zpool status cygnus
pool: cygnus
state: ONLINE
scan: scrub in progress since Tue Dec 13 16:52:14 2011
4.82T scanned out of 14.6T at 210M/s, 13h29m to go
30K repaired, 33.13% done
config:

NAME STATE READ WRITE CKSUM
cygnus ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c5t45d0 ONLINE 0 0 0 (repairing)
c5t44d0 ONLINE 0 0 0 (repairing)
c5t43d0 ONLINE 0 0 0 (repairing)
c5t47d0 ONLINE 0 0 0 (repairing)
c5t41d0 ONLINE 0 0 0
c5t38d0 ONLINE 0 0 0 (repairing)
c5t40d0 ONLINE 0 0 0 (repairing)
c5t39d0 ONLINE 0 0 0 (repairing)
c5t42d0 ONLINE 0 0 0
c5t46d0 ONLINE 0 0 0 (repairing)


It's my belief that the repair data statistic indicates "pool level" data inconsistencies that requires correction, but that no checksum or other errors are indicated because there is no specific hard drive that has returned a failed block (or whatever unit is applicable).

Remember, data can be consistent when it leaves the HDD platter but inconsistent due to memory, cabling, etc. (See http://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data)

If the problems can't be isolated to a specific device, ZFS can't log the error because it can't truly diagnose it, if you know what I mean... i.e. it only knows there was an inconsistency and replaced whatever inconsistency it found using a consistent replica and then adds that to the repaired figure.

It's kinda the same thing you can see with ZFS mirrors during an actual HDD failure: even if one drive of the mirrored pair starts spewing up checksum errors that show up in the CKSUM count column, assuming the other drive in the mirrored pair is working 100%, you'll see a zero indicated for the number of checksum errors for the mirror itself as well as the pool the mirror belongs to, because the errors you do see are considered localized to the device and not to the pool.

So, again, I think the data repaired statistic is specific to the pool, but doesn't reflect issues on the devices.

Again, this is all just my educated guess. Should you be concerned? It's a long story so I won't go into it here, but from one incident I had my answer is "maybe". It could indicate faulty power, weak cabling, a faulty controller... or it could just be transient. I'd recommend scrubbing the pool regularly (e.g. weekly) and seeing if you consistently get non-zero results and consider doing some low-cost basic replacements if you can (e.g. SATA cables).
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Alright, just been doing some more reading - so far I've found descriptions of the columns:

The READ and WRITE columns provide a count of I/O errors that occurred on the device, while the CKSUM column provides a count of uncorrectable checksum errors that occurred on the device. Both error counts indicate a potential device failure, and some corrective action is needed. If non-zero errors are reported for a top-level virtual device, portions of your data might have become inaccessible.

The errors: field identifies any known data errors.

from: http://docs.oracle.com/cd/E19963-01/html/821-1448/gaynp.html

Also of interest:

These errors are divided into three categories:

READ – I/O errors that occurred while issuing a read request

WRITE – I/O errors that occurred while issuing a write request

CKSUM – Checksum errors, meaning that the device returned corrupted data as the result of a read request.

These errors can be used to determine if the damage is permanent. A small number of I/O errors might indicate a temporary outage, while a large number might indicate a permanent problem with the device. These errors do not necessarily correspond to data corruption as interpreted by applications. If the device is in a redundant configuration, the devices might show uncorrectable errors, while no errors appear at the mirror or RAID-Z device level. In such cases, ZFS successfully retrieved the good data and attempted to heal the damaged data from existing replicas.

from: http://docs.oracle.com/cd/E19963-01/html/821-1448/gbbuw.html

Doesn't really answer my question, though, but it might be helpful to someone else looking for more info on all of this.
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
I honestly don't know for sure either, but I see the same thing from time to time. I have one pool that's a uniform collection of Hitachi disks off an internal LSI2008 baed controller that never has errors, and a pool that's a mismash of WD (some 512, some 4K) & Seagate 2TB drives in an external case attached to a SAS expander by way of an LSI1068 based controller with external cabling, and go figure, it almost always has some small quantity of data that gets repaired with no device level errors (running this scrub right now):


It's my belief that the repair data statistic indicates "pool level" data inconsistencies that requires correction, but that no checksum or other errors are indicated because there is no specific hard drive that has returned a failed block (or whatever unit is applicable).

Remember, data can be consistent when it leaves the HDD platter but inconsistent due to memory, cabling, etc. (See http://blogs.oracle.com/bonwick/entry/zfs_end_to_end_data)

If the problems can't be isolated to a specific device, ZFS can't log the error because it can't truly diagnose it, if you know what I mean... i.e. it only knows there was an inconsistency and replaced whatever inconsistency it found using a consistent replica and then adds that to the repaired figure.

It's kinda the same thing you can see with ZFS mirrors during an actual HDD failure: even if one drive of the mirrored pair starts spewing up checksum errors that show up in the CKSUM count column, assuming the other drive in the mirrored pair is working 100%, you'll see a zero indicated for the number of checksum errors for the mirror itself as well as the pool the mirror belongs to, because the errors you do see are considered localized to the device and not to the pool.

So, again, I think the data repaired statistic is specific to the pool, but doesn't reflect issues on the devices.

Again, this is all just my educated guess. Should you be concerned? It's a long story so I won't go into it here, but from one incident I had my answer is "maybe". It could indicate faulty power, weak cabling, a faulty controller... or it could just be transient. I'd recommend scrubbing the pool regularly (e.g. weekly) and seeing if you consistently get non-zero results and consider doing some low-cost basic replacements if you can (e.g. SATA cables).

Thanks very much for your input here - it's much appreciated. I might do more than weekly scrubs on it in the meantime just to keep an eye on it since it's a reasonably new setup - and next time I take it down to add in another RAID card and some more drives I'll swap out all of the cabling for a fresh set and see if that doesn't change anything. Running ECC memory which passes memtest etc. so hopefully the problem doesn't lie there :/
 

brutalizer

[H]ard|Gawd
Joined
Oct 23, 2010
Messages
1,600
Good that you use ZFS. If you have used another solution, Windows or Linux, they would not have reported anything at all, because they have no similar detection. No warnings. Now you know that something might be troublesome and you can monitor it closely. Windows would never warn about SATA cables not correctly plugged in, for instance.
 

jonnyjl

Limp Gawd
Joined
Apr 12, 2009
Messages
195
Good that you use ZFS. If you have used another solution, Windows or Linux, they would not have reported anything at all, because they have no similar detection. No warnings. Now you know that something might be troublesome and you can monitor it closely. Windows would never warn about SATA cables not correctly plugged in, for instance.
This.

The zpool read/write/chksum error list will probably (or should...) correspond (not like a 1:1 relationship) to iostat. As you found, these are errors reported by the disk, not reported by the zpool. If the disk thinks everything is fine and dandy, which disks do, then your result seems logical. The zfs is the safeguard against that type ("hardware") of data corruption.

You may want to run iostat -exmn (or -Exmn) and notice if there are any errors reported there. Then maybe watch it through iostat -exmn 1... You might see one drive with an out of wack asvc_t or %b
 
Last edited:

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
This.

The zpool read/write/chksum error list will probably (or should...) correspond (not like a 1:1 relationship) to iostat. As you found, these are errors reported by the disk, not reported by the zpool. If the disk thinks everything is fine and dandy, which disks do, then your result seems logical. The zfs is the safeguard against that type ("hardware") of data corruption.

You may want to run iostat -exmn (or -Exmn) and notice if there are any errors reported there. Then maybe watch it through iostat -exmn 1... You might see one drive with an out of wack asvc_t or %b

OK, I ran iostat -exmn and got:

iostat -exmn
extended device statistics ---- errors ---
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/w trn tot device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 fd0
0.7 0.6 21.1 5.0 0.0 0.0 0.0 21.5 0 1 0 0 0 0 c3t0d0
144.4 44.2 6223.5 3751.0 0.3 0.4 1.7 2.1 4 12 0 194 0 194 c0t0d0
145.9 44.1 6219.2 3750.2 0.3 0.4 1.6 1.9 4 11 0 0 0 0 c0t1d0
142.4 44.0 6225.1 3750.3 0.3 0.4 1.6 2.1 4 12 0 0 0 0 c0t2d0
141.6 44.0 6224.3 3750.9 0.3 0.4 1.6 2.2 4 12 0 0 0 0 c0t3d0
141.2 44.0 6219.9 3750.3 0.3 0.4 1.6 2.1 4 12 0 0 0 0 c0t4d0
141.7 44.0 6224.5 3750.5 0.3 0.4 1.6 2.1 4 12 0 0 0 0 c0t5d0

I've had a look at the solaris iostat man page but it doesn't list what h/w is - I assume hardware errors? I assume this points to something flaky relating to that specific disk?
 

jonnyjl

Limp Gawd
Joined
Apr 12, 2009
Messages
195
OK, I ran iostat -exmn and got:



I've had a look at the solaris iostat man page but it doesn't list what h/w is - I assume hardware errors? I assume this points to something flaky relating to that specific disk?
Aye. If you run iostat -En you'll see the full name of the fields, and it is hard error.

IIRC, Transport errors would be more disk controller or cable related, hard is an unrecoverable error, and soft are recoverable errors.
 
Last edited:

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Aye. If you run iostat -Exmn you'll see the full name of the fields, and it is hardware error.

It could still be anything in the subsystem, but if your other disks are on the same controller/cable (SAS?, SATA?) then you can make a pretty safe assumption it is just that disk.

Aha! That's a really good command to know, thanks.

Output for that drive is:

c0t0d0 Soft Errors: 0 Hard Errors: 241 Transport Errors: 0
Vendor: ATA Product: WDC WD20EARX-00P Revision: AB51 Serial No: WD-WMAZA5568241
Size: 2000.40GB <2000398934016 bytes>
Media Error: 122 Device Not Ready: 0 No Device: 0 Recoverable: 0

Same controller, different cable - using the onboard SATA ports for these six drives. I'll swap out the cables and see if that changes anything.

Edit: Just saw your edit - well, looks like it might be time to swap out the HDD then if that's the case. Time to go see if WD will replace it under warranty...
 

jonnyjl

Limp Gawd
Joined
Apr 12, 2009
Messages
195
Edit: Just saw your edit - well, looks like it might be time to swap out the HDD then if that's the case. Time to go see if WD will replace it under warranty...
Lol... yeah I was brain farting all over the place.

Try WD online, I've done it pretty hassle-free, I've done RE4-GP drives, Blue drives (notebook) and I've been able to do it Advanced RMA with no issues (and no cost). I also pay for the return label via their site and its usually cheaper than doing it yourself.

Unless WD changes that policy (post back on what you do), I'm buying WD from now on. It's stressful to wait a week for an RMA drive while your array is hanging on for dear life.

PS One of my RE4-GPs has a few hard/transport/timeout errors. Usually a good indication it will fail at some point. Nothing more than count 4, but I just keep watch on it. I have to get more Hitachis RMA'd so I can get my hotspare count back up. Sigh, and its Christmas time, I'm sure freight shipping will be fantastic.
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Lol... yeah I was brain farting all over the place.

Try WD online, I've done it pretty hassle-free, I've done RE4-GP drives, Blue drives (notebook) and I've been able to do it Advanced RMA with no issues (and no cost). I also pay for the return label via their site and its usually cheaper than doing it yourself.

Unless WD changes that policy (post back on what you do), I'm buying WD from now on. It's stressful to wait a week for an RMA drive while your array is hanging on for dear life.

PS One of my RE4-GPs has a few hard/transport/timeout errors. Usually a good indication it will fail at some point. Nothing more than count 4, but I just keep watch on it. I have to get more Hitachis RMA'd so I can get my hotspare count back up. Sigh, and its Christmas time, I'm sure freight shipping will be fantastic.

Sadly I'm living with the kangaroos and WD don't offer Advanced RMA in this country :/ Which is a darn shame, as I'd very much like to have access to that.

What with the shortage that's happening at the moment I have a suspicion that I'll send it away and it'll be weeks before a replacement arrives - someone I know locally got a drive RMA'd not long ago and was informed that they'd have to wait some weeks before a replacement could be sourced. I have adequate backups in place but not so many that I could afford to sacrifice one for a temporary internal drive... may have to bite the bullet and fork out for a single 2TB at today's prices to get the peace of mind of not running a degraded array for any length of time.

Glad to hear that you've managed to RMA so many drives successfully, though - this is one of the first failures we've had on a drive that's actually within warranty, we usually seem to have OK luck with HDDs... so far. Good to know that the process is relatively painless :/
 

jonnyjl

Limp Gawd
Joined
Apr 12, 2009
Messages
195
Oh that sucks... geez weeks for a replacement is harsh, but its good you have backups, which I don't... too much data and its all personal use stuff so meh.

I've done a couple Hitachis recently and the turn around was still decent despite the issues going on with the storage industries supply chain. The shipping is whats a crap shoot. One of them, I had a notification that it shipped and it was over a week (not counting weekends). Another time, it came in two days.

This is, of course, not counting the time to send the drive to them.
 

brutalizer

[H]ard|Gawd
Joined
Oct 23, 2010
Messages
1,600
What does SMART say? We see here that ZFS detects problems at once, but does SMART have any warnings for you?
 

xenorg

n00b
Joined
Jul 1, 2005
Messages
3
Do you have fault management running?

Run fmdump and have a look what shows up in there.
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
What does SMART say? We see here that ZFS detects problems at once, but does SMART have any warnings for you?

Doing sudo smartctl -a -d scsi /dev/rdsk/c0t0d0s0 shows:

Serial number: WD-WMAZA5568241
Device type: disk
Local Time is: Fri Dec 16 14:31:16 2011 EST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature: 34 C
scsiGetStartStopData Failed [scsi response fails sanity test]

Error Counter logging not supported
No self-tests have been logged

The sanity test fail comes back about 50% of the time, otherwise returning zero values - am I using that command incorrectly?

Do you have fault management running?

Run fmdump and have a look what shows up in there.

fmdump shows:

Dec 12 13:01:11.0822 5b32b916-613f-c405-a206-df21a1982981 SMF-8000-YX Diagnosed
Dec 12 13:01:11.1858 b3fbeac4-ebc1-4cc5-a17d-ed9f012208c6 ZFS-8000-D3 Diagnosed
Dec 12 13:01:11.3350 cae50110-0b5e-4f7f-bd49-f80c9dbc199a ZFS-8000-D3 Diagnosed
Dec 12 13:01:11.5195 bd963b44-77fc-e3c7-f4d6-88afb1a46695 ZFS-8000-D3 Diagnosed
Dec 12 13:01:11.7106 5482656d-3060-c27e-8e9c-8cb1ad41fcb6 ZFS-8000-D3 Diagnosed
Dec 12 13:01:11.8983 189832bd-25e9-4b77-fcc5-9b53b480b5b3 ZFS-8000-D3 Diagnosed
Dec 12 13:01:12.1077 18ed51f3-e320-eece-cc73-bf47f5c10089 ZFS-8000-D3 Diagnosed
Dec 12 13:01:12.4089 4b701b98-53a9-6465-9ba2-f86f86e7aa33 ZFS-8000-CS Diagnosed
Dec 12 13:11:40.4639 5b32b916-613f-c405-a206-df21a1982981 FMD-8000-4M Repaired
Dec 12 13:11:40.4718 5b32b916-613f-c405-a206-df21a1982981 FMD-8000-6U Resolved

What should I look for in that output?
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
run "fmdump -vu <UUID>" and, it'll give more info for the event with the UUID you supply.

Really daft question - how on earth do I list the UUIDs in Solaris? I'm familiar with how they work on Ubuntu but it doesn't seem that Solaris uses them in the same way and Google isn't helping out much here...

Edit: Brain spasm - just pulled one from that list I quoted above (though I don't know which device that relates to):

$ fmdump -vu 5b32b916-613f-c405-a206-df21a1982981
TIME UUID SUNW-MSG-ID EVENT
Dec 12 13:01:11.0822 5b32b916-613f-c405-a206-df21a1982981 SMF-8000-YX Diagnosed
100% defect.sunos.smf.svc.maintenance

Problem in: svc:///network/physical:default
Affects: svc:///network/physical:default
FRU: -
Location: -

Dec 12 13:11:40.4639 5b32b916-613f-c405-a206-df21a1982981 FMD-8000-4M Repaired
100% defect.sunos.smf.svc.maintenance Repair Attempted

Problem in: svc:///network/physical:default
Affects: svc:///network/physical:default
FRU: -
Location: -

Dec 12 13:11:40.4718 5b32b916-613f-c405-a206-df21a1982981 FMD-8000-6U Resolved
100% defect.sunos.smf.svc.maintenance Repair Attempted

Problem in: svc:///network/physical:default
Affects: svc:///network/physical:default
FRU: -
Location: -
 

xenorg

n00b
Joined
Jul 1, 2005
Messages
3
Those are for non zfs related events (SMF-8000-YX, FMD-8000-4M, etc) Grab the uuid for the zfs events (ZFS-8000-D3, ZFS-8000-CS).

Really daft question - how on earth do I list the UUIDs in Solaris? I'm familiar with how they work on Ubuntu but it doesn't seem that Solaris uses them in the same way and Google isn't helping out much here...

Edit: Brain spasm - just pulled one from that list I quoted above (though I don't know which device that relates to):
 

Jim G

Limp Gawd
Joined
Jun 2, 2011
Messages
221
Those are for non zfs related events (SMF-8000-YX, FMD-8000-4M, etc) Grab the uuid for the zfs events (ZFS-8000-D3, ZFS-8000-CS).

Got it - output:

$ fmdump -vu b3fbeac4-ebc1-4cc5-a17d-ed9f012208c6
TIME UUID SUNW-MSG-ID EVENT
Dec 12 13:01:11.1858 b3fbeac4-ebc1-4cc5-a17d-ed9f012208c6 ZFS-8000-D3 Diagnosed
100% fault.fs.zfs.device

Problem in: zfs://pool=9cc2051e6f22d399/vdev=e12a29bfbd4c6528
Affects: zfs://pool=9cc2051e6f22d399/vdev=e12a29bfbd4c6528
FRU: -
Location: -

$ fmdump -vu 4b701b98-53a9-6465-9ba2-f86f86e7aa33
TIME UUID SUNW-MSG-ID EVENT
Dec 12 13:01:12.4089 4b701b98-53a9-6465-9ba2-f86f86e7aa33 ZFS-8000-CS Diagnosed
100% fault.fs.zfs.pool

Problem in: zfs://pool=9cc2051e6f22d399
Affects: zfs://pool=9cc2051e6f22d399
FRU: -
Location: -


All of the other errors look mostly the same as the first quote - though all refer to different addresses. What should I be looking for in this?
 
Top