ZFS pool in degraded state. Seeking advice.

Shockey

2[H]4U
Joined
Nov 24, 2008
Messages
2,203
I seem to have made a muckery of my ZFS pool

I noticed i had a failed disk with message "to many corrupted error" so i tried making the pool rebuild without much luck. I order a new disk and inserted it into the same slot, but i noticed another disk appears to be reporting error (see below)

admin@SAN:~$ zpool status
pool: data
state: DEGRADED
scan: scrub repaired 0 in 1h58m with 0 errors on Wed Jan 30 02:43:31 2013
config:

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
c3t5000C5004E334744d0 ONLINE 0 0 0
c3t5000C5004E44CC0Ed0 ONLINE 0 0 0
c3t5000C5004E48FC47d0 ONLINE 0 0 0
c3t5000C5004E55FDCEd0 ONLINE 0 0 0
replacing-4 UNAVAIL 0 0 0 insufficient replicas
c3t5000C5004E5606B4d0/old OFFLINE 0 0 0
c3t5000C5004E5606B4d0 FAULTED 0 0 0 too many errors
c3t5000C5004E7664F5d0 ONLINE 0 0 0
c3t5000C5004E771B9Dd0 ONLINE 0 0 0
c3t5000C500537895ABd0 ONLINE 0 0 0
cache
c3t5001517387E8CC5Ed0 ONLINE 0 0 0

errors: No known data errors

pool: rpool
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
c5t0d0s0 ONLINE 0 0 0

errors: No known data errors
So before i go about running more command i figure i better get some advice from more experienced zfs users. So anyone care to chime in on what i need to do to bring my pool back to good health?

The new drive is inserted into the old slot of the drive that failed.
 

Billy_nnn

Limp Gawd
Joined
Feb 8, 2012
Messages
242
What exactly have you done so far, when did you do it, and what commands have you used?
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,040
For me it seems
1. c3t5000C5004E5606B4d0 failed
2. you tried to rebuild with the same faulted disk

needed action:
replace c3t5000C5004E5606B4d0 with the new disk
 

Shockey

2[H]4U
Joined
Nov 24, 2008
Messages
2,203
For me it seems
1. c3t5000C5004E5606B4d0 failed
2. you tried to rebuild with the same faulted disk

needed action:
replace c3t5000C5004E5606B4d0 with the new disk

Yes. I did try rebuilding with same faulted disk as i thought it was just errors. Didn't think the drive would of failed already...

i attempted to execute the command "zpool data replace c3t5000C5004E5606B4d0 c3t50014EE003819259d0"

I got this response "cannot open '/dev/dsk/c3t50014EE003819259d0s0': I/O error"

Not understanding what this disk has to do with replacing c3t5000C5004E5606B4d0.

I inserted the new drive into the slot of the failed one. ID from napp-it for new drive is c3t50014EE003819259d0
 
Joined
Sep 14, 2008
Messages
1,622
Yes. I did try rebuilding with same faulted disk as i thought it was just errors. Didn't think the drive would of failed already...

i attempted to execute the command "zpool data replace c3t5000C5004E5606B4d0 c3t50014EE003819259d0"

I got this response "cannot open '/dev/dsk/c3t50014EE003819259d0s0': I/O error"

Not understanding what this disk has to do with replacing c3t5000C5004E5606B4d0.

I inserted the new drive into the slot of the failed one. ID from napp-it for new drive is c3t50014EE003819259d0


Absolutely sure that is the disk? If you run:

iostat -xn

Do you see said disk listed at the bottom of the list?

you can run:

iostat -xn 5

and see disk activity on all disks for every 5 seconds and start reading/copying some files and look for the disk not doing disk activity as well.
 

staticlag

[H]ard|Gawd
Joined
Mar 26, 2010
Messages
1,679
It might be a hardware problem, check the cable/backplane. Try swapping disks around
 

_Gea

2[H]4U
Joined
Dec 5, 2010
Messages
4,040
Yes. I did try rebuilding with same faulted disk as i thought it was just errors. Didn't think the drive would of failed already...

i attempted to execute the command "zpool data replace c3t5000C5004E5606B4d0 c3t50014EE003819259d0"

I got this response "cannot open '/dev/dsk/c3t50014EE003819259d0s0': I/O error"

Not understanding what this disk has to do with replacing c3t5000C5004E5606B4d0.

I inserted the new drive into the slot of the failed one. ID from napp-it for new drive is c3t50014EE003819259d0

This indicates that you try to replace a faulted/missing disk with a damaged disk.
Check cables, try another slot/ another disk.

ps
WWN numbers like c3t50014EE003819259d0 are disk unique. not slot related
 

Shockey

2[H]4U
Joined
Nov 24, 2008
Messages
2,203
Absolutely sure that is the disk? If you run:

iostat -xn

Do you see said disk listed at the bottom of the list?

you can run:

iostat -xn 5

and see disk activity on all disks for every 5 seconds and start reading/copying some files and look for the disk not doing disk activity as well.

iostat -nx command results are as follows

iostat -nx
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 fd0
0.3 1.5 6.0 17.2 0.0 0.0 0.0 0.6 0 0 c5t0d0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c4t0d0
17.8 18.0 1516.3 1868.2 0.0 0.2 0.1 5.1 0 3 c3t5000C5004E55FDCEd0
18.7 18.0 1624.2 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E48FC47d0
17.6 18.0 1514.5 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E7664F5d0
18.6 18.0 1623.6 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E44CC0Ed0
18.7 18.0 1623.5 1868.2 0.0 0.2 0.1 4.4 0 3 c3t5000C5004E334744d0
18.5 18.0 1623.4 1868.2 0.0 0.2 0.1 4.9 0 3 c3t5000C500537895ABd0
18.5 18.0 1623.4 1868.2 0.0 0.2 0.1 5.0 0 3 c3t5000C5004E771B9Dd0
1.2 38.7 26.0 4904.3 0.0 0.3 0.0 7.4 0 3 c3t5001517387E8CC5Ed0
0.0 0.0 0.1 0.0 0.0 0.0 1.4 3.0 0 0 c3t50014EE003819259d0

Was able to identify the new drive with iostat -xn 5 physically. I swapped it over to a new slot in the norco 2224 case to rule out bad back plane.

I executed the " zpool replace data c3t5000C5004E5606B4d0 c3t50014EE003819259d0" command after switch the disk to a new slot and it re silvering now. :)
 
Top