PDA

View Full Version : Encrypted 2TB RAID5 crashes alot


bubbles
01-18-2006, 03:27 PM
Here is specs:

Asus P4P800-Deluxe
P4 2.4Ghz
1GB DDR400
2U rackmount case
2x SATA backplanes
RocketRaid 1820A controller
8x 320GB WD SATA drives in RAID5
1x 80GB WD O/S drive
Seasonic 600w PSU

100mbit LAN with 5 to 50 users at any time.

Fedora Core 3
Highpoint drivers

The RAID5 is run as an encrypted loop device using LOOP-AES. The device is one big 2TB partion formatted with JFS (4k block size).

System was started off with random file corruption so swapped memory and it worked fine for 1 week then started freezing up more and more frequently until it wouldnt even boot unless it was left turned off for at least 12 hours, sometimes only the NIC would crash.

Changed to a different motherboard and all seemed ok for about a week then it started crashing again. It would come back online right after a reboot unlike before but would crash frequently.

Next changed the PSU from stock 460w to a Seasonic 600w. Now system was running fine for about a week until i did some stuff with the "dd" command and it froze up. Rebooted it and it came back online but now crashing ever 1 to 5 days. The crashes seem to get more and more frequent.

- motherboard swapped
- memory swapped
- psu swapped
- moved to another room with better A/C and different power.

I can't think of anything else to try?? There is really no software on it besides the pretty much default kernel, server daemon, loop-aes module and RR driver.

Monitoring it with "top" can't really see anything that triggers a crash.

ANY IDEAS?

Nn'theraq'pss
01-18-2006, 03:36 PM
Have you tried upgrading to FC4?

defakto
01-18-2006, 03:38 PM
Or running it without the encryption to see if that is causing the issue?

bubbles
01-18-2006, 04:16 PM
I started off with FC4 but there was a problem which I can't remember so I had to use FC3.


Can't run it without the crypto because the data we need is on the encrypted device!

tdg
01-18-2006, 04:22 PM
Have you set up level logging to analyze what is happening when, or just before, the crash?

bubbles
01-20-2006, 05:10 PM
Theres nothing in the logs to indicate what causes the crashes. The weird thing is, the longer I leave the system off between powering off and powering on again after a crash, the longer the system seems to stay up before the next crash. Sounds like a PSU problem maybe but I have already changed the PSU to the best one I could get. I don't think it's a heat issue either because the crash intervals are manys days not minutes/hours.

Nn'theraq'pss
01-20-2006, 05:28 PM
Sounds like either the RocketRAID card or one or more hard drives might be going bad. Also some WD HDDs have problems with certain RAID cards in that sometimes they don't respond right away to requests from the RAID card. This causes the RAID card to think they've gone bad and drops them when in reality the HDDs timeout hasn't been reached yet.