Areca 1882 RAID 6 Array Reports 2 of 10 disks as “free” upon reboot

Discussion in 'SSDs & Data Storage' started by Hammer!, Jan 2, 2019.

  1. Hammer!

    Hammer! [H]Lite

    Messages:
    111
    Joined:
    Nov 25, 2011
    Hi, I have a 1882 card with a RAID 6 array running on Windows Server 2016 and when I rebooted my server, the array came back up as degraded and two disks which were part of the array are reported as “free”. I’ve tried to reboot again hoping it will re-recognize these disks, but no luck. Any way to fix this? Just worried that the rebuild will fail and I will lose everything since my last backup. Thanks!
     
  2. zpackrat

    zpackrat Gawd

    Messages:
    687
    Joined:
    Jan 28, 2002
    I get the panic, but no need to have 2 posts. you should delete one and be as exact as possible in the other.
     
  3. Hammer!

    Hammer! [H]Lite

    Messages:
    111
    Joined:
    Nov 25, 2011
    Sorry, I don't understand your response...I only posted once. The other post is for a different question. Although I should have posted both in the Areca thread...

    What other info might be helpful for me to share?
     
  4. zpackrat

    zpackrat Gawd

    Messages:
    687
    Joined:
    Jan 28, 2002
  5. Hammer!

    Hammer! [H]Lite

    Messages:
    111
    Joined:
    Nov 25, 2011
  6. mwroobel

    mwroobel [H]ardness Supreme

    Messages:
    4,882
    Joined:
    Jul 24, 2008
    Please post a complete log from the card.
     
  7. Hammer!

    Hammer! [H]Lite

    Messages:
    111
    Joined:
    Nov 25, 2011
    Hi, here's the log, I actually ended up installing two new drives and rebuilt the array and it finished fine! I did find some references to use "rescue RAID set" to recover the array, but the instructions were somewhat unclear (at least to me) and I didn't want to risk messing things up and losing all my data.

    I do, however, want to see if I can figure out what happened and if there is anything I can do to prevent the drives from dropping again. The two drives that "dropped" both seem fine under CrystalDiskInfo and work without issues on a different system so I don't think they are actually bad, so I am not sure what happened. If someone can help me decipher the log and give me some pointers that would be greatly appreciated. The array in question is RAID6-8TB and consisted of 14 WD 8TB Red drives. I had one Hot Spare allocated to the array and it looks like that did not kick in since two drives failed at the same time. All drives in the array are in Enclosures 2, 3 and 4 (these are all internal expanders attached internally to the RAID card within the server). In addition, externally attached to the RAID card is a Supermicro SC847 JBOD Chassis. What is interesting is that I see references to drives in E3 being removed and inserted (eg. 2018-12-31 16:12:09 E3 Slot 01 Device Removed). This cannot be the case since E3 only housed drives for the RAID6-8TB array and although I was in the process of creating a new array using the Supermicro JBOD chassis, I did not touch the drives in E2, E3 and E4 which only contained drives for the array that failed - RAID6-8TB.

    Could it be that the card got confused with the various enclosures/expanders that are attached? I say this because it seems the naming scheme for the JBOD chassis is in the form of EX Slot Y where as the internal drives are referenced as EX Disk #Y AND given E3 only has space for 6 disks, it is weird that the log is reporting E3 Slot 13 is timing out "2018-12-31 21:36:28 E3 Slot 13 Time Out Error" and also drives >#6 are being removed - "2018-12-31 16:12:36 E3 Slot 18 Device Removed"

    Unfortunately, I can't recreate the numbering of the Enclosures because I removed the external SC847 chassis from the card before I started the rebuild process so I can make sure the rebuild process was not interrupted by anything.

    Thank you - any insights would be greatly appreciated.

    2019-01-03 22:00:45 RAID6-8TB Complete Rebuild 039:24:43
    2019-01-02 06:40:19 E4 Disk #3 HotSpare Created
    2019-01-02 06:39:56 E4 Disk #3 HotSpare Deleted
    2019-01-02 06:36:02 RAID6-8TB Start Rebuilding
    2019-01-02 06:36:00 RAID6-8TB Rebuild RaidSet
    2019-01-02 06:36:00 E4 Disk #2 HotSpare Used
    2019-01-02 06:35:50 E3 Disk #0 HotSpare Used
    2019-01-02 06:35:49 E4 Disk #3 HotSpare Created
    2019-01-02 06:35:49 E4 Disk #2 HotSpare Created
    2019-01-02 06:34:40 192.168.001.032 HTTP Log In
    2019-01-02 06:33:54 H/W Monitor Raid Powered On
    2019-01-02 06:32:37 001:00057FE52C00 Lost Rebuilding/Migration LBA
    2019-01-02 06:32:36 H/W Monitor Raid Powered On
    2019-01-02 05:34:32 RAID6_2TB Start Checking
    2019-01-02 05:28:06 RAID6-8TB Offlined
    2019-01-02 05:14:36 ARC-1882-VOL#001 Start Initialize
    2019-01-02 05:14:29 H/W Monitor Raid Powered On
    2019-01-02 05:12:44 001:000530B59C00 Restart Init LBA Point
    2019-01-02 05:12:44 RAID6-8TB Failed Volume Revived
    2019-01-02 05:12:42 H/W Monitor Raid Powered On
    2019-01-01 22:50:28 E5 Disk #3 Device Removed
    2019-01-01 22:50:28 E5 Disk #2 Device Removed
    2019-01-01 22:50:28 E5 Disk #1 Device Removed
    2019-01-01 22:50:28 E5 Disk #0 Device Removed
    2019-01-01 22:50:28 E2 SES2Device Time Out Error
    2019-01-01 22:50:27 RAID6-8TB Abort Checking 000:57:01 129
    2019-01-01 22:50:27 RAID6-8TB RaidSet Degraded
    2019-01-01 22:50:27 RAID6-8TB RaidSet Degraded
    2019-01-01 22:50:27 RAID6-8TB RaidSet Degraded
    2019-01-01 22:50:27 E4 Disk #5 Time Out Error
    2019-01-01 22:50:27 E4 Disk #4 Time Out Error
    2019-01-01 22:50:27 E4 Disk #3 Time Out Error
    2019-01-01 22:49:00 Enclosure#5 Removed
    2019-01-01 22:48:58 RAID6-8TB Volume Failed
    2019-01-01 22:48:58 E5 SES2DeviceßÒ@ Device Removed
    2019-01-01 22:48:58 RAID6-8TB Volume Failed
    2019-01-01 22:48:58 RAID6-8TB Volume Degraded
    2019-01-01 22:48:58 RAID6-8TB RaidSet Degraded
    2019-01-01 22:48:58 RAID6-8TB Volume Degraded
    2019-01-01 21:53:26 RAID6_2TB Start Checking
    2019-01-01 21:53:26 RAID6-8TB Start Checking
    2019-01-01 21:42:27 192.168.001.221 HTTP Log In
    2019-01-01 21:34:26 E3 Slot 06 Time Out Error
    2019-01-01 21:29:18 E3 Slot 12 Time Out Error
    2019-01-01 21:25:14 E3 Slot 06 Time Out Error
    2019-01-01 21:19:11 E3 Slot 06 Time Out Error
    2019-01-01 21:13:10 E3 Slot 06 Time Out Error
    2019-01-01 21:06:19 E3 Slot 06 Time Out Error
    2019-01-01 20:30:06 192.168.001.221 HTTP Log In
    2019-01-01 16:57:47 E3 Slot 12 Time Out Error
    2019-01-01 16:46:32 E3 Slot 06 Time Out Error
    2019-01-01 16:40:02 E3 Slot 06 Time Out Error
    2019-01-01 16:24:37 E3 Slot 06 Time Out Error
    2019-01-01 16:19:38 E3 Slot 06 Time Out Error
    2019-01-01 16:18:01 E3 Slot 06 Time Out Error
    2019-01-01 16:11:26 E3 Slot 06 Reading Error
    2019-01-01 16:01:18 E3 Slot 06 Reading Error
    2019-01-01 15:47:47 E3 Slot 06 Time Out Error
    2019-01-01 15:44:59 E3 Slot 04 Time Out Error
    2019-01-01 15:41:29 E3 Slot 06 Reading Error
    2019-01-01 15:31:06 E3 Slot 06 Time Out Error
    2019-01-01 15:26:42 E3 Slot 06 Time Out Error
    2019-01-01 15:12:57 E3 Slot 06 Reading Error
    2019-01-01 15:12:32 E3 Slot 06 Time Out Error
    2019-01-01 15:10:16 E3 Slot 06 Time Out Error
    2019-01-01 15:06:40 E3 Slot 06 Time Out Error
    2019-01-01 15:00:04 E3 Slot 06 Time Out Error
    2019-01-01 14:48:20 E3 Slot 06 Time Out Error
    2019-01-01 14:45:20 E3 Slot 06 Time Out Error
    2019-01-01 14:39:13 E3 Slot 06 Time Out Error
    2019-01-01 14:36:43 E3 Slot 06 Time Out Error
    2019-01-01 14:05:13 192.168.001.159 HTTP Log In
    2019-01-01 13:21:30 E3 Slot 06 Time Out Error
    2019-01-01 13:14:57 E3 Slot 06 Time Out Error
    2019-01-01 13:13:19 E3 Slot 06 Reading Error
    2019-01-01 13:10:14 E3 Slot 06 Time Out Error
    2019-01-01 13:08:23 E3 Slot 06 Time Out Error
    2019-01-01 12:59:25 E3 Slot 06 Time Out Error
    2019-01-01 12:56:47 E3 Slot 06 Reading Error
    2019-01-01 12:41:35 E3 Slot 06 Time Out Error
    2019-01-01 12:15:18 E3 Slot 06 Time Out Error
    2019-01-01 11:59:58 ARC-1882-VOL#001 Start Initialize
    2019-01-01 11:57:26 001:0001A4A11300 Restart Init LBA Point
    2019-01-01 11:57:23 H/W Monitor Raid Powered On
    2019-01-01 11:48:14 E3 Slot 06 Reading Error
    2019-01-01 11:48:11 E3 Slot 06 Reading Error
    2019-01-01 11:29:34 E3 Slot 06 Time Out Error
    2019-01-01 11:20:23 E3 Slot 06 Time Out Error
    2019-01-01 11:00:43 E3 Slot 06 Reading Error
    2019-01-01 10:20:44 E3 Slot 06 Time Out Error
    2019-01-01 10:16:27 E3 Slot 06 Time Out Error
    2019-01-01 10:12:07 E3 Slot 06 Time Out Error
    2019-01-01 10:06:29 E3 Slot 06 Reading Error
    2019-01-01 09:55:12 E3 Slot 06 Time Out Error
    2019-01-01 09:53:31 E3 Slot 06 Time Out Error
    2019-01-01 09:21:01 E3 Slot 06 Time Out Error
    2019-01-01 09:15:56 E3 Slot 06 Time Out Error
    2019-01-01 09:08:55 E3 Slot 06 Time Out Error
    2019-01-01 09:07:02 E3 Slot 06 Time Out Error
    2019-01-01 09:02:55 E3 Slot 06 Time Out Error
    2019-01-01 08:58:15 E3 Slot 06 Time Out Error
    2019-01-01 08:52:18 E3 Slot 06 Time Out Error
    2019-01-01 08:47:08 E3 Slot 06 Time Out Error
    2019-01-01 08:44:38 E3 Slot 06 Time Out Error
    2019-01-01 08:39:28 E3 Slot 06 Time Out Error
    2019-01-01 08:33:10 E3 Slot 06 Time Out Error
    2019-01-01 08:22:33 E3 Slot 06 Time Out Error
    2019-01-01 08:10:36 E3 Slot 06 Time Out Error
    2019-01-01 07:59:37 E3 Slot 06 Time Out Error
    2019-01-01 07:56:17 E3 Slot 13 Time Out Error
    2019-01-01 07:49:56 E3 Slot 06 Time Out Error
    2019-01-01 07:48:20 E3 Slot 06 Time Out Error
    2019-01-01 07:45:08 E3 Slot 06 Time Out Error
    2019-01-01 07:44:59 192.168.001.221 HTTP Log In
    2019-01-01 07:39:43 E3 Slot 06 Time Out Error
    2019-01-01 07:32:07 E3 Slot 06 Time Out Error
    2019-01-01 07:24:38 E3 Slot 06 Reading Error
    2019-01-01 07:18:23 E3 Slot 06 Time Out Error
    2019-01-01 07:16:39 E3 Slot 06 Time Out Error
    2019-01-01 07:02:13 E3 Slot 06 Time Out Error
    2019-01-01 06:52:12 E3 Slot 06 Time Out Error
    2019-01-01 06:41:07 E3 Slot 06 Time Out Error
    2019-01-01 06:36:49 E3 Slot 06 Time Out Error
    2019-01-01 06:32:21 E3 Slot 06 Time Out Error
    2019-01-01 06:28:03 E3 Slot 06 Time Out Error
    2019-01-01 06:19:54 E3 Slot 06 Time Out Error
    2019-01-01 06:15:56 E3 Slot 06 Time Out Error
    2019-01-01 06:07:59 E3 Slot 06 Time Out Error
    2019-01-01 05:58:38 E3 Slot 06 Time Out Error
    2019-01-01 05:51:10 E3 Slot 06 Time Out Error
    2019-01-01 05:31:18 E3 Slot 06 Time Out Error
    2019-01-01 05:18:12 E3 Slot 06 Time Out Error
    2019-01-01 04:33:15 E3 Slot 06 Time Out Error
    2019-01-01 04:30:35 E3 Slot 06 Time Out Error
    2019-01-01 04:29:02 E3 Slot 06 Time Out Error
    2019-01-01 04:03:32 E3 Slot 06 Time Out Error
    2019-01-01 03:41:27 E3 Slot 06 Time Out Error
    2019-01-01 03:39:54 E3 Slot 06 Time Out Error
    2019-01-01 03:10:45 E3 Slot 06 Time Out Error
    2019-01-01 02:22:30 E3 Slot 06 Time Out Error
    2019-01-01 02:14:46 E3 Slot 06 Time Out Error
    2019-01-01 02:12:24 E3 Slot 06 Time Out Error
    2019-01-01 02:00:22 E3 Slot 06 Time Out Error
    2019-01-01 01:58:37 E3 Slot 06 Time Out Error
    2019-01-01 01:37:26 E3 Slot 06 Time Out Error
    2019-01-01 01:17:45 E3 Slot 06 Time Out Error
    2019-01-01 00:17:39 E3 Slot 06 Time Out Error
    2019-01-01 00:00:26 E3 Slot 06 Time Out Error
    2018-12-31 23:46:42 E3 Slot 06 Time Out Error
    2018-12-31 23:36:52 E3 Slot 06 Time Out Error
    2018-12-31 23:22:52 E3 Slot 06 Time Out Error
    2018-12-31 23:21:04 E3 Slot 06 Time Out Error
    2018-12-31 23:19:30 E3 Slot 06 Time Out Error
    2018-12-31 23:03:02 E3 Slot 06 Time Out Error
    2018-12-31 22:09:19 E3 Slot 06 Time Out Error
    2018-12-31 21:55:08 E3 Slot 06 Time Out Error
    2018-12-31 21:36:28 E3 Slot 13 Time Out Error
    2018-12-31 21:08:15 E3 Slot 04 Time Out Error
    2018-12-31 18:50:35 E3 Slot 12 Time Out Error
    2018-12-31 18:46:44 E3 Slot 12 Time Out Error
    2018-12-31 18:07:26 E3 Slot 04 Time Out Error
    2018-12-31 18:03:51 E3 Slot 12 Time Out Error
    2018-12-31 18:02:19 E3 Slot 12 Time Out Error
    2018-12-31 18:00:19 E3 Slot 12 Time Out Error
    2018-12-31 17:03:35 E3 Slot 12 Time Out Error
    2018-12-31 16:22:34 ARC-1882-VOL#001 Start Initialize
    2018-12-31 16:22:32 ARC-1882-VOL#001 Create Volume
    2018-12-31 16:22:04 S1_BOTTOM_BACK Create RaidSet
    2018-12-31 16:13:11 E3 Slot 10 Device Inserted
    2018-12-31 16:13:03 E3 Slot 02 Device Inserted
    2018-12-31 16:13:00 E3 Slot 20 Device Removed
    2018-12-31 16:12:49 E3 Slot 03 Device Inserted
    2018-12-31 16:12:47 E3 Slot 19 Device Removed
    2018-12-31 16:12:36 E3 Slot 18 Device Removed
    2018-12-31 16:12:23 E3 Slot 01 Device Inserted
    2018-12-31 16:12:21 E3 Slot 03 Device Removed
    2018-12-31 16:12:13 E3 Slot 02 Device Removed
    2018-12-31 16:12:09 E3 Slot 01 Device Removed
    2018-12-31 16:09:40 192.168.001.159 HTTP Log In
    2018-12-31 16:08:59 S1_BOTTOM_BACK Delete RaidSet
    2018-12-31 16:08:59 S1_BOTTOM_BACK Abort Initialization 000:18:58
    2018-12-31 16:08:45 192.168.001.032 HTTP Log In
    2018-12-31 16:07:09 192.168.001.159 HTTP Log In
    2018-12-31 16:00:58 E6 Slot 21 HotSpare Created
    2018-12-31 15:57:50 E6 Slot 21 HotSpare Deleted
    2018-12-31 15:57:15 E6 Slot 21 HotSpare Created
    2018-12-31 15:56:53 192.168.001.032 HTTP Log In
    2018-12-31 15:50:01 S1_BOTTOM_BACK Start Initialize
    2018-12-31 15:49:59 S1_BOTTOM_BACK Create Volume
    2018-12-31 15:49:11 S1_BACKUP_BACK Renamed
    2018-12-31 15:48:28 S1_BACKUP_BACK Create RaidSet
    2018-12-31 15:47:40 S1_BOTTOM_BACK Delete RaidSet
    2018-12-31 15:46:26 192.168.001.032 HTTP Log In
    2018-12-31 15:45:35 H/W Monitor Raid Powered On
     
  8. bigddybn

    bigddybn [H]ardness Supreme

    Messages:
    6,850
    Joined:
    Nov 21, 2006
    Obligatory RAID is no replacement for backups post.
     
  9. zpackrat

    zpackrat Gawd

    Messages:
    687
    Joined:
    Jan 28, 2002
    depending on the drives used, if not for enterprise/raid configs, hdd's can have issues writing data and when this happens in a single disk scenario, the disk will keep trying for a defined period before marking the block bad and relocating the data. When this happens on a disk in a raid configuration, if the time out is too long the controller will drop the drive(s) from the array, in essence, if there is a bad block, a raid controller want's it marked and to move on to a new location quickly. WD drives used to be modifiable using their TLER utility to change this setting, but it no longer works on newer drives. I'd be curious to know what drives you are using in the array. and while it will take quite some time, use the manufacturer test utility to do a full scan of the failed drives before using them again to be sure they are OK.
    Here is a good read on the topic: https://blog.fosketts.net/2017/05/30/turn-off-error-recovery-raid-drives-tler-erc-cctl/
     
  10. Hammer!

    Hammer! [H]Lite

    Messages:
    111
    Joined:
    Nov 25, 2011
    Thank you. I am using WD Red drives which are for RAID and have TLER, albeit not enterprise class. My timeout setting is the max 7 seconds on the Areca card. I’m still puzzled by the Enclosure and drive numbering issue...also, it is interesting the drives which were dropped from the array were marked as “free” vs. “failed”.
     
    Last edited: Jan 5, 2019
  11. likeman

    likeman Gawd

    Messages:
    604
    Joined:
    Aug 17, 2011
    Looks like 2-3 disk failed in there (I see that delete and recreate was used so I assume there is no data on them) mostly error on slot 6