silkshadow
n00b
- Joined
- Aug 9, 2010
- Messages
- 58
The card is question is a Promise Ex8350. This card is old. I believe I bought it in like 2004. I've since pulled it into different systems as I bought better raid cards. Its been doing duty on a MCE array for recorded TV and junk my cousins and I upload to it for my Grandmother to watch.
My grandmother called and complained that her shows were not playing. So I went over there this weekend and pulled up the logs. Saw that the array was going off and on line, yikes! I apologized to my grandmother and pulled the system to my house to test it.
Here's the core specs:
Gigabyte GA-P35 board
Intel core2duo 2.4ghz
2GB DDR2 memory
Nvidia 8600 passive
Promise Ex8350
8x 1TB Western Digital Green EACS (in raid 5 on the raid card)
200gb samsung spinpoint OS drive
What I desperately need help with is making a decision to maximize the little time I have to get this back up and to spend money on fixing this correctly. If the card is bad then I will work on getting it back up without the card. If the card is ok, then I will spend the time cleaning the machine, adding fans and replacing disks. I just can't figure out if the card is good or bad
. Over the weekend I was able to get data off of the array (phew), took freaking forever, but that is not a consideration thankfully.
Symptoms:
-Array goes on and offline. In the logs, the array would go offline for periods then come back up in critical condition and the ex8350 would be rebuilding the array. The process never completes, while rebuilding, the array would go back offline before it completes.
-Heat seems to be a problem. When getting data off the array. The array would go offline and I would have to shut it down. I would have to keep it off for about 5 minutes before powering it back up. I discovered I could cut the downtime if I opened the case and turned on the aircon in my work room. The headsink on the XOR is very hot, but this is not abnormal. This card always runs hot. I had another one, that kicked the bucket fully last year, but it was also always too hot to touch.
I was only able to start troubleshooting yesterday, as it took me 3 days to get the data off the array.
Troubleshooting steps so far:
-Tested all the disks with WD util. All come up ok, but Promises on card ulil (WebPam's media patrol) find bad sectors on all the disks. Is there a definitive disk checking program I can use to reconcile the different results?
-When the array went down it seems like 2 disks might be the culprits. Those disks checked out fine (as above). So I am confused.
-I was able to complete a rebuild of the array (first time since the logs reported the array going offline) with the case open and the aircon on. However, it then went critical again a few hours later, with case open.
Given the time testing these disks takes, that is all I have accomplished
. Is this enough to say the card is the culprit? I am not sure because of a few things:
-Decreasing heat allowed the array to rebuild. Maybe the massive buildup of dust (the case and internals was/is covered with a thick later of dust) increased het to unfunctionalble levels? A meticulous cleaning, more fans and so forth might fix this.
-Conflicting disk status reports. I have no clue how reliable the WD util is. The disks seem to work, but maybe it could be the disks. 2 bad ones would cause this. I have no drives to test it out with. I really don't want to buy 2 1TB disks to test it with because, after the test they are useless to me (I am all full up with dozens of 2TBs).
-I am wishing for this not to be the card because I can't replace it. I have no spares and they do not sell this kind of equipment locally. I have imported all of this and that takes a lot of time.
My grandmother is very old, this is operated by her caregiver, and watching it is one of the few things she enjoys these days. Can't put into words how happy it made her when I installed it a couple years ago and she started using it for a few weeks. It was a surprise to the whole family, as she was not a TV/movie person before. So getting this back on its feet sooner rather than later would be very nice. In other words, I am under familial pressure (sucks begin the family geek). So big thanks for any help here!
My grandmother called and complained that her shows were not playing. So I went over there this weekend and pulled up the logs. Saw that the array was going off and on line, yikes! I apologized to my grandmother and pulled the system to my house to test it.
Here's the core specs:
Gigabyte GA-P35 board
Intel core2duo 2.4ghz
2GB DDR2 memory
Nvidia 8600 passive
Promise Ex8350
8x 1TB Western Digital Green EACS (in raid 5 on the raid card)
200gb samsung spinpoint OS drive
What I desperately need help with is making a decision to maximize the little time I have to get this back up and to spend money on fixing this correctly. If the card is bad then I will work on getting it back up without the card. If the card is ok, then I will spend the time cleaning the machine, adding fans and replacing disks. I just can't figure out if the card is good or bad
Symptoms:
-Array goes on and offline. In the logs, the array would go offline for periods then come back up in critical condition and the ex8350 would be rebuilding the array. The process never completes, while rebuilding, the array would go back offline before it completes.
-Heat seems to be a problem. When getting data off the array. The array would go offline and I would have to shut it down. I would have to keep it off for about 5 minutes before powering it back up. I discovered I could cut the downtime if I opened the case and turned on the aircon in my work room. The headsink on the XOR is very hot, but this is not abnormal. This card always runs hot. I had another one, that kicked the bucket fully last year, but it was also always too hot to touch.
I was only able to start troubleshooting yesterday, as it took me 3 days to get the data off the array.
Troubleshooting steps so far:
-Tested all the disks with WD util. All come up ok, but Promises on card ulil (WebPam's media patrol) find bad sectors on all the disks. Is there a definitive disk checking program I can use to reconcile the different results?
-When the array went down it seems like 2 disks might be the culprits. Those disks checked out fine (as above). So I am confused.
-I was able to complete a rebuild of the array (first time since the logs reported the array going offline) with the case open and the aircon on. However, it then went critical again a few hours later, with case open.
Given the time testing these disks takes, that is all I have accomplished
-Decreasing heat allowed the array to rebuild. Maybe the massive buildup of dust (the case and internals was/is covered with a thick later of dust) increased het to unfunctionalble levels? A meticulous cleaning, more fans and so forth might fix this.
-Conflicting disk status reports. I have no clue how reliable the WD util is. The disks seem to work, but maybe it could be the disks. 2 bad ones would cause this. I have no drives to test it out with. I really don't want to buy 2 1TB disks to test it with because, after the test they are useless to me (I am all full up with dozens of 2TBs).
-I am wishing for this not to be the card because I can't replace it. I have no spares and they do not sell this kind of equipment locally. I have imported all of this and that takes a lot of time.
My grandmother is very old, this is operated by her caregiver, and watching it is one of the few things she enjoys these days. Can't put into words how happy it made her when I installed it a couple years ago and she started using it for a few weeks. It was a surprise to the whole family, as she was not a TV/movie person before. So getting this back on its feet sooner rather than later would be very nice. In other words, I am under familial pressure (sucks begin the family geek). So big thanks for any help here!