ZFS pool in degraded state. Seeking advice.

Shockey

2[H]4U
Joined
Nov 24, 2008
Messages
2,279
I seem to have made a muckery of my ZFS pool

I noticed i had a failed disk with message "to many corrupted error" so i tried making the pool rebuild without much luck. I order a new disk and inserted it into the same slot, but i noticed another disk appears to be reporting error (see below)

admin@SAN:~$ zpool status
pool: data
state: DEGRADED
scan: scrub repaired 0 in 1h58m with 0 errors on Wed Jan 30 02:43:31 2013
config:

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
c3t5000C5004E334744d0 ONLINE 0 0 0
c3t5000C5004E44CC0Ed0 ONLINE 0 0 0
c3t5000C5004E48FC47d0 ONLINE 0 0 0
c3t5000C5004E55FDCEd0 ONLINE 0 0 0
replacing-4 UNAVAIL 0 0 0 insufficient replicas
c3t5000C5004E5606B4d0/old OFFLINE 0 0 0
c3t5000C5004E5606B4d0 FAULTED 0 0 0 too many errors
c3t5000C5004E7664F5d0 ONLINE 0 0 0
c3t5000C5004E771B9Dd0 ONLINE 0 0 0
c3t5000C500537895ABd0 ONLINE 0 0 0
cache
c3t5001517387E8CC5Ed0 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0

errors: No known data errors
So before i go about running more command i figure i better get some advice from more experienced zfs users. So anyone care to chime in on what i need to do to bring my pool back to good health?

The new drive is inserted into the old slot of the drive that failed.
 
What exactly have you done so far, when did you do it, and what commands have you used?
 
For me it seems
1. c3t5000C5004E5606B4d0 failed
2. you tried to rebuild with the same faulted disk

needed action:
replace c3t5000C5004E5606B4d0 with the new disk
 
For me it seems
1. c3t5000C5004E5606B4d0 failed
2. you tried to rebuild with the same faulted disk

needed action:
replace c3t5000C5004E5606B4d0 with the new disk

Yes. I did try rebuilding with same faulted disk as i thought it was just errors. Didn't think the drive would of failed already...

i attempted to execute the command "zpool data replace c3t5000C5004E5606B4d0 c3t50014EE003819259d0"

I got this response "cannot open '/dev/dsk/c3t50014EE003819259d0s0': I/O error"

Not understanding what this disk has to do with replacing c3t5000C5004E5606B4d0.

I inserted the new drive into the slot of the failed one. ID from napp-it for new drive is c3t50014EE003819259d0
 
Yes. I did try rebuilding with same faulted disk as i thought it was just errors. Didn't think the drive would of failed already...

i attempted to execute the command "zpool data replace c3t5000C5004E5606B4d0 c3t50014EE003819259d0"

I got this response "cannot open '/dev/dsk/c3t50014EE003819259d0s0': I/O error"

Not understanding what this disk has to do with replacing c3t5000C5004E5606B4d0.

I inserted the new drive into the slot of the failed one. ID from napp-it for new drive is c3t50014EE003819259d0


Absolutely sure that is the disk? If you run:

iostat -xn

Do you see said disk listed at the bottom of the list?

you can run:

iostat -xn 5

and see disk activity on all disks for every 5 seconds and start reading/copying some files and look for the disk not doing disk activity as well.
 
It might be a hardware problem, check the cable/backplane. Try swapping disks around
 
Yes. I did try rebuilding with same faulted disk as i thought it was just errors. Didn't think the drive would of failed already...

i attempted to execute the command "zpool data replace c3t5000C5004E5606B4d0 c3t50014EE003819259d0"

I got this response "cannot open '/dev/dsk/c3t50014EE003819259d0s0': I/O error"

Not understanding what this disk has to do with replacing c3t5000C5004E5606B4d0.

I inserted the new drive into the slot of the failed one. ID from napp-it for new drive is c3t50014EE003819259d0

This indicates that you try to replace a faulted/missing disk with a damaged disk.
Check cables, try another slot/ another disk.

ps
WWN numbers like c3t50014EE003819259d0 are disk unique. not slot related
 
Absolutely sure that is the disk? If you run:

iostat -xn

Do you see said disk listed at the bottom of the list?

you can run:

iostat -xn 5

and see disk activity on all disks for every 5 seconds and start reading/copying some files and look for the disk not doing disk activity as well.

iostat -nx command results are as follows

iostat -nx
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 fd0
0.3 1.5 6.0 17.2 0.0 0.0 0.0 0.6 0 0 c5t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0
17.8 18.0 1516.3 1868.2 0.0 0.2 0.1 5.1 0 3 c3t5000C5004E55FDCEd0
18.7 18.0 1624.2 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E48FC47d0
17.6 18.0 1514.5 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E7664F5d0
18.6 18.0 1623.6 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E44CC0Ed0
18.7 18.0 1623.5 1868.2 0.0 0.2 0.1 4.4 0 3 c3t5000C5004E334744d0
18.5 18.0 1623.4 1868.2 0.0 0.2 0.1 4.9 0 3 c3t5000C500537895ABd0
18.5 18.0 1623.4 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E771B9Dd0
1.2 38.7 26.0 4904.3 0.0 0.3 0.0 7.4 0 3 c3t5001517387E8CC5Ed0
0.0 0.0 0.1 0.0 0.0 0.0 1.4 3.0 0 0 c3t50014EE003819259d0

Was able to identify the new drive with iostat -xn 5 physically. I swapped it over to a new slot in the norco 2224 case to rule out bad back plane.

I executed the " zpool replace data c3t5000C5004E5606B4d0 c3t50014EE003819259d0" command after switch the disk to a new slot and it re silvering now. :)
 
Back
Top