• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Software RAID Failure?

Carlosinfl

Loves the juice
Joined
Sep 25, 2002
Messages
6,633
Guys - I am not sure what happened here but I think I have a problem with RAID (software) on my home machine. I am using it fine now however during boot up, I see something scroll by in red however it is un-readable at 100 mph so I decided to check out some md stats on the box to see if something happened to a disk or RAID array I setup and I can't understand what I am looking at so perhaps you guys can help.

To make things as clear as possible, I have 2 identical drives on the machine both via S-ATA. The drives are 2x Western Digital 160GB disks and I am pretty sure they are both good but I am not sure.

Here is what I see:

Code:
tunafish:/home/cwilliams/Desktop# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.03
  Creation Time : Fri Jun 22 21:20:43 2007
     Raid Level : raid1
     Array Size : 19534976 (18.63 GiB 20.00 GB)
  Used Dev Size : 19534976 (18.63 GiB 20.00 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Fri Jul 13 15:03:20 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 93f4ddb3:70d5783e:f47a10c4:9fe19ef3
         Events : 0.6582

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       0        0        1      removed

As you can see there is a section that should match up below as /dev/sdb3 however it shows remove...:confused:

Then there is my 2nd RAID

Code:
tunafish:/home/cwilliams/Desktop# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.03
  Creation Time : Fri Jun 22 21:20:49 2007
     Raid Level : raid1
     Array Size : 135275264 (129.01 GiB 138.52 GB)
  Used Dev Size : 135275264 (129.01 GiB 138.52 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Fri Jul 13 15:08:02 2007
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : c627213e:cbaed46d:6510c67a:3cf96311
         Events : 0.4564

    Number   Major   Minor   RaidDevice State
       0       8        4        0      active sync   /dev/sda4
       1       0        0        1      removed

Is one of my disks bad? What should I do? Should I place an identical spare in place of /dev/sdb and see if it starts to rebuild?

Both drives feel warm to touch as they are both getting power and I visible in the BIOS so I know it sees the drives but perhaps it has failed sectors on the disk, I don't know...

Code:
tunafish:/home/cwilliams/Desktop# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda4[0]
      135275264 blocks [2/1] [U_]

md0 : active raid1 sda3[0]
      19534976 blocks [2/1] [U_]

unused devices: <none>

Suggestions or comments from data above?
 
I'm no mdadm expert, but you might try re-adding sdb into both arrays and see what mdadm tells you.

I've had drives seemingly "randomly" drop out of my softRAID5 array, and that's all I've ever done. :confused:

also, have you configured your MAILADDR in /etc/mdadm/mdadm.conf (this file may vary based on your linux distro) -- this is where mdadm sends RAID events to, such as disk failures and status messages. I recently figured this out and it helped me when debugging some wierd problems with a shitty fileserver built out of ghetto parts I use for testing stuffs.
 
What do you mean "re-add" /dev/sdb? How exactly do I re-add it?

I checked /var/log/messages and found the following...

*****

Jul 11 21:22:42 tunafish kernel: md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
Jul 11 21:22:42 tunafish kernel: md: bitmap version 4.39
Jul 11 21:22:42 tunafish kernel: md: raid1 personality registered for level 1
Jul 11 21:22:42 tunafish kernel: md: md0 stopped.
Jul 11 21:22:42 tunafish kernel: md: bind<sda3>
Jul 11 21:22:42 tunafish kernel: md: bind<sdb3>
Jul 11 21:22:42 tunafish kernel: md: kicking non-fresh sda3 from array!
Jul 11 21:22:42 tunafish kernel: md: unbind<sda3>
Jul 11 21:22:42 tunafish kernel: md: export_rdev(sda3)
Jul 11 21:22:42 tunafish kernel: raid1: raid set md0 active with 1 out of 2 mirrors
Jul 11 21:22:42 tunafish kernel: md: md1 stopped.
Jul 11 21:22:42 tunafish kernel: md: bind<sda4>
Jul 11 21:22:42 tunafish kernel: md: bind<sdb4>
Jul 11 21:22:42 tunafish kernel: md: kicking non-fresh sda4 from array!
Jul 11 21:22:42 tunafish kernel: md: unbind<sda4>
Jul 11 21:22:42 tunafish kernel: md: export_rdev(sda4)
Jul 11 21:22:42 tunafish kernel: raid1: raid set md1 active with 1 out of 2 mirrors
Jul 11 21:22:42 tunafish kernel: Attempting manual resume
Jul 11 21:22:42 tunafish kernel: EXT3-fs: INFO: recovery required on readonly filesystem.
Jul 11 21:22:42 tunafish kernel: EXT3-fs: write access will be enabled during recovery.
Jul 11 21:22:42 tunafish kernel: kjournald starting. Commit interval 5 seconds
Jul 11 21:22:42 tunafish kernel: EXT3-fs: recovery complete.
Jul 11 21:22:42 tunafish kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 11 21:22:42 tunafish kernel: ts: Compaq touchscreen protocol output
Jul 11 21:22:42 tunafish kernel: input: PC Speaker as /class/input/input3
Jul 11 21:22:42 tunafish kernel: Real Time Clock Driver v1.12ac
Jul 11 21:22:42 tunafish kernel: i2c_adapter i2c-0: nForce2 SMBus adapter at 0x1c00
Jul 11 21:22:42 tunafish kernel: i2c_adapter i2c-1: nForce2 SMBus adapter at 0x1c80
Jul 11 21:22:42 tunafish kernel: ACPI: PCI Interrupt Link [AAZA] enabled at IRQ 20
Jul 11 21:22:42 tunafish kernel: ACPI: PCI Interrupt 0000:00:0f.1 -> Link [AAZA] -> GSI 20 (level, low) -> IRQ 90
Jul 11 21:22:42 tunafish kernel: hda_codec: Unknown model for AD1988, trying auto-probe from BIOS...
Jul 11 21:22:42 tunafish kernel: Adding 497972k swap on /dev/sda1. Priority:-1 extents:1 across:497972k
Jul 11 21:22:42 tunafish kernel: Adding 497972k swap on /dev/sdb1. Priority:-2 extents:1 across:497972k
Jul 11 21:22:42 tunafish kernel: EXT3 FS on md0, internal journal
Jul 11 21:22:42 tunafish kernel: loop: loaded (max 8 devices)
Jul 11 21:22:42 tunafish kernel: device-mapper: ioctl: 4.7.0-ioctl (2006-06-24) initialised: dm-devel@redhat.com
Jul 11 21:22:42 tunafish kernel: kjournald starting. Commit interval 5 seconds
Jul 11 21:22:42 tunafish kernel: EXT3 FS on sda2, internal journal
Jul 11 21:22:42 tunafish kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 11 21:22:42 tunafish kernel: kjournald starting. Commit interval 5 seconds
Jul 11 21:22:42 tunafish kernel: EXT3 FS on md1, internal journal
Jul 11 21:22:42 tunafish kernel: EXT3-fs: mounted filesystem with ordered data mode.
Jul 11 21:22:42 tunafish kernel: kjournald starting. Commit interval 5 seconds
Jul 11 21:22:42 tunafish kernel: EXT3 FS on sdb2, internal journal
Jul 11 21:22:42 tunafish kernel: EXT3-fs: mounted filesystem with ordered data mode.
 
You need to redo the mirroring for both of those arrays. Your situation looks suspiciously like something mounted the filesystems on sda directly, instead of using the md devices.
 
What do you mean "re-do"?

Do you mean back up my data and start from scratch? :eek:
 
You need to redo the mirroring for both of those arrays. Your situation looks suspiciously like something mounted the filesystems on sda directly, instead of using the md devices.

This could be possible ...
You should probably try to backup your data anyways (I know you have RAID 1, so you've got mirrors already :p but just to be safe)

You may want to run
Code:
yourbox# mdadm --query /dev/sdb3
and
Code:
mdadm --query /dev/sdb4
just to see if your partitions actually still have their RAID info on them.

If so, then have you tried doing the following? This will re-add the partitions that are missing from each array back into the array:

(please double check and make sure that you are adding the correct partitions back -- I gleaned the info from the error messages you posted earlier, but it's morning, and I might get them wrong)

Code:
yourbox# mdadm --re-add /dev/md0 /dev/sdb3
Code:
yourbox# mdadm --re-add /dev/md1 /dev/sdb4
ps - if re-add doesnt work, try just doing

Code:
yourbox# mdadm --add [md device] [partition]
 
It appears to be re-building the broken mirror with the following command!

Thanks for your help and I hope this fixes the issue.

Code:
cwilliams@tunafish:~$ su
Password:
tunafish:/home/cwilliams# mdadm --re-add /dev/md0 /dev/sdb3
mdadm: re-added /dev/sdb3
tunafish:/home/cwilliams# mdadm --re-add /dev/md0 /dev/sdb4
mdadm: added /dev/sdb4
tunafish:/home/cwilliams# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda4[0]
      135275264 blocks [2/1] [U_]

md0 : active raid1 sdb4[2](S) sdb3[3] sda3[0]
      19534976 blocks [2/1] [U_]
      [>....................]  recovery =  4.1% (817536/19534976) finish=6.4min speed=48090K/sec

unused devices: <none>
tunafish:/home/cwilliams# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda4[0]
      135275264 blocks [2/1] [U_]

md0 : active raid1 sdb4[2](S) sdb3[3] sda3[0]
      19534976 blocks [2/1] [U_]
      [>....................]  recovery =  4.9% (971584/19534976) finish=6.6min speed=46265K/sec

unused devices: <none>

****EDIT****

OK - it appears to have completed rebuilding but md1 does not show both up for some reason.

Code:
tunafish:/home/cwilliams# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda4[0]
      135275264 blocks [2/1] [U_]

md0 : active raid1 sdb4[2](S) sdb3[1] sda3[0]
      19534976 blocks [2/2] [UU]

unused devices: <none>
 
Err, I'll just point out the obvious here. Of course md1 didn't rebuild.

This looks like an "oh, shit!" moment to me. You have a three partition mirror on md0 now :D
 
Yup - I noticed that too :eek:

Is there a way to fix this? I have no idea how this happened.

How can I move sdb4 to md1?

It should be

sda3 & sdb3 = md0
sda4 & sdb4 = md1

I am so confused as to what happened.
 
I am so confused as to what happened.

This is what happened:

Code:
tunafish:/home/cwilliams# mdadm --re-add /dev/md0 /dev/sdb3
tunafish:/home/cwilliams# mdadm --re-add /dev/md0 /dev/sdb4

mdadm did exactly what you told it to do.

It's easy enough to fix. Remove sdb4 from md0, and then add it to md1.
 
Bones - thanks for showing me what a moron I am. :D

Now is there a separate command for simply removing /dev/sdb4 from /dev/md0? I know I can run the following:

Code:
#mdadm --re-add /dev/md1 /dev/sdb4

Thanks again for all your assistance...
 
Carlos,

in a terminal, type in
Code:
yourbox# man mdadm

this will give you the "man page" or manual for mdadm, and has all of the information that I've been posting here for you :D

hint: look for the --remove option :p
 
Thanks all for all your help!

Code:
tunafish:/home/cwilliams# mdadm --remove /dev/md0 /dev/sdb4
mdadm: hot removed /dev/sdb4
tunafish:/home/cwilliams# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda3[0] sdb3[1]
      19534976 blocks [2/2] [UU]

md1 : active raid1 sda4[0]
      135275264 blocks [2/1] [U_]

unused devices: <none>
tunafish:/home/cwilliams# mdadm --re-add /dev/md1 /dev/sdb4
mdadm: added /dev/sdb4
tunafish:/home/cwilliams# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda3[0] sdb3[1]
      19534976 blocks [2/2] [UU]

md1 : active raid1 sdb4[2] sda4[0]
      135275264 blocks [2/1] [U_]
      [>....................]  recovery =  0.1% (259200/135275264) finish=34.7min speed=64800K/sec
 
Back
Top