Server (Win2k3) freezing

Snowknight26

Supreme [H]ardness
Joined
May 8, 2005
Messages
4,434
I recently noticed that my file server (10TB RAID5 by way of an Areca 1280ML running a fully updated Server 2003 R2 from a boot drive not on the RAID) would freeze occasionally. Things like the screen not updating, input freezing, remote file copies stopping, and not being able to ping it are how I can tell that its frozen. The one thing that continues to work is the RAID controller which I can access via ethernet. The only thing I can do at that point is cut the power (who uses the on/off button anyway? :p)

System log show nothing out of the ordinary (seems pretty logical as the system would freeze, not giving it enough time to write to the event logs), RAM is brand new and is highly unlikely to show any errors, and the temps are good. The only way I can actually tell after a 'restart' that it did freeze is the dialog box asking why the OS was unexpectedly shutdown after logging in.

First noticed the problem about a week or two ago, but apart from having a battery backup for the RAID controller added (and then a couple days ago some more RAM for the RAID controller as well), nothing has changed.

So for the life of me, I can't figure out what could be causing this. Any ideas at how to tackle this problem (heck, or a solution) would be very much appreciated.
 
Disable anti-virus would be my first suggestion.

I've had problems with AV completely locking a server up so tight that I can't do ANYTHING with it.

Heck, it was even so bad it locked all the WORKSTATIONS up as well... Can't really explain that one, but the second the network cable got unplugged they got freed up.

I think my hunch would be it accessing and scanning databases (Exchange, SQL, things of that nature) and locks it all up.
 
It's not running any anti-virus software. Infact, its not running much. Apache, a VNC, APC's PowerChute, and thats pretty much it.
Also, I should mention that this is only a file server, not a database or web server. The majority of the things its used for are copying files to/from it over a network.
 
It's not running any anti-virus software. Infact, its not running much. Apache, a VNC, APC's PowerChute, and thats pretty much it.
Also, I should mention that this is only a file server, not a database or web server. The majority of the things its used for are copying files to/from it over a network.

How often does it do this?

My suggestion would be to boot into safe mode during the weekend or something, leave it running through Sunday, see if it's frozen when you get there early on Monday.
 
Hard to say. Happen anywhere from once every couple of days to as often as twice per day.
 
Normal desktop hardware (guess its not really a "file server" in those terms then): Intel DP965LT, E2180, 2x1GB G.Skill PC2-5300, Areca 1280ML, and a Matrox Millennium G550.
 
Eh the 965 chipset is quite solid 'n stable.
Latest BIOS?
Overclocking?
G.Skill is normally standard voltage right? 1.9V? Or is yours a model which requires a boost?
CPU HSF clear of dust bunnies?
 
Can't check what the latest BIOS is because Intel's site is down, but I believe so.
No OC'ing, RAM's running at 1.8V, all the fans/heatsinks are clean.
 
It's now done it twice since I last posted. Really driving me insane with it being so random...
 
Not possible at the moment.
I did try to run Memtest86 but it kept freezing with a blank blue screen and the green corner that says Memtest86+ v2.01 (the red + kept pulsating). Swapped the RAM for two identical sticks but the same memtest problem.
Haven't had time to check whether the freezing is still occuring as its only been 4 or so hours.

Edit: http://techrepublic.com.com/5208-6230-0.html?forumID=101&threadID=210647&start=0 Seems to be exactly as I describe my problem. Infact, it could be a bad slot/memory controller. Just recently I RMA'd some RAM after it started playing up in this very same motherboard but would work perfectly in a different one. I guess after replacing the RAM (then again just today) it took a while for any problems to appear. Hopefully I won't have to replace the mobo, but its not a problem if it comes to it as I have another DP965LT laying around.
 
Not possible at the moment.
I understand, would just rule out (for the most part) a software issue... As I have had software do this to me in the past.

Edit: http://techrepublic.com.com/5208-6230-0.html?forumID=101&threadID=210647&start=0 Seems to be exactly as I describe my problem. Infact, it could be a bad slot/memory controller.
If it is, it's a Mobo issue.

Not sure how big of budget you've got or how much money rolls through your organization every hour, but at least in my case, this downtime would outweigh just buying a new mobo and being done with the problem.
 
So the motherboard was finally replaced several days ago and was working perfectly till 5 minutes ago when it just froze again.

Bah. So irritating.

Any other thoughts?
 
I'd burn a new copy of MemTest and run it after everyone leaves, and get to work and make sure to boot the server up before anyone gets to work the next day... See if it errors out at all.

The thing is if MemTest is freezing for the same reason Windows is, that means that it is most definitely hardware problem.

Now we're looking at any number of things, but the main two in my mind would be the hard drives and possibly RAM.
 
Have you ran some serious stress test to your mass storage controller?
I know you have a good RAM, but random locks is usually a memory issue. memtest will be first thing I look into. Also, I seen a similar problem with a faulty mass storage controller.

If it was minor software locks, they should have shown on your logs. I say it is a hardware issue.
 
The thing is if MemTest is freezing for the same reason Windows is, that means that it is most definitely hardware problem.
I want to agree but that doesn't leave much after replacing the motherboard. CPU, RAM, HDDs, RAID controller, and thats all I can think of.

Now we're looking at any number of things, but the main two in my mind would be the hard drives and possibly RAM.

I do believe I tried different RAM (that was actually good) after I had first posted about this issue (which was some months ago), so thats possibly out of the question. HDDs I doubt as there is only one that would cause any issues - the boot drive. Can't be the drives on the RAID otherwise the RAID card would say so somewhere.
 
I do remember reading (although it was some time ago) that PowerChute could cause symptoms such as this. I'd still try TechieSooner's suggestion with safe mode.
 
I do remember reading (although it was some time ago) that PowerChute could cause symptoms such as this.

I really hope its not that. If it is I'd laugh really hard, then cry a bit at me blatantly overlooking it. Guess I'll try updating PowerChute first (hmm, no version of Personal Editon works with Server 2003 according to them?), then running with out it before I resort to safe mode.
 
I want to agree but that doesn't leave much after replacing the motherboard. CPU, RAM, HDDs, RAID controller, and thats all I can think of.
The only thing you've replaced to-date is the motherboard, right?
That still leaves your ram and hard disks.


HDDs I doubt as there is only one that would cause any issues - the boot drive. Can't be the drives on the RAID otherwise the RAID card would say so somewhere.
Not necessarily... If something is getting hung on access then it could be any of them.

I really hope its not that. If it is I'd laugh really hard, then cry a bit at me blatantly overlooking it. Guess I'll try updating PowerChute first (hmm, no version of Personal Editon works with Server 2003 according to them?), then running with out it before I resort to safe mode.
There's a reason why there is Enterprise/Server grade stuff and then personal stuff.
I'd still give Safe Mode a shot. Like I've said many times, I've had software do this very thing.
Friday evening boot into safe mode, and early Monday morning go check and see if it's frozen. That's a good 60 hours.
 
I updated PowerChute, still froze. I disabled PowerChute from starting up, still froze.
Currently disabled all unnecessary services. If that doesn't work, safe mode it is. Hope file sharing still works in safe mode.
 
I updated PowerChute, still froze. I disabled PowerChute from starting up, still froze.
Currently disabled all unnecessary services. If that doesn't work, safe mode it is. Hope file sharing still works in safe mode.

File sharing won't work in safe mode because networking is disabled... Unless you do Safe Mode + Networking, but like I've said I've had networked software cause this issue....


IIRC PowerChute also has a service.


All your drivers look OK in Device Manager?

I'd seriously bring the server down for the weekend and run it in Safe Mode to see if it happens again. Why are you fighting that so hard, any reason you can't do it?
 
File sharing won't work in safe mode because networking is disabled...
Yea, I noticed when I was going through all the services.

IIRC PowerChute also has a service.
I know, I disabled the service and the startup program.


All your drivers look OK in Device Manager?
Nothing out of the ordinary.

Why are you fighting that so hard, any reason you can't do it?
It's being used constantly (read: 24/7) by more than just me.
 
Bummer. It's too bad it wasn't a simple software fix. Might actually be time to open a case with MS and pay the money. I know they have a whole lot of ways to turn on debug logging in different places. You'd probably find a lot of articles on the internet about turning them on too. But, sounds like you just have to find that root cause.
 
Yea, I noticed when I was going through all the services.

I know, I disabled the service and the startup program.


Nothing out of the ordinary.

It's being used constantly (read: 24/7) by more than just me.

Are you really sure it wasn't a mass storage controller? Are you 100% confident? I seen similar crashes, because a controller was going bad. I could be wrong. I am sharing my past experience.
 
Are you really sure it wasn't a mass storage controller? Are you 100% confident? I seen similar crashes, because a controller was going bad. I could be wrong. I am sharing my past experience.

I'm not 100% sure that I can rule the RAID controller out, but seeing as it was a $1000 RAID card, I have my doubts. As I said, I can still access the RAID controller even after the OS freezes. I can check its logs, HDDs' statuses, and so forth.

Now its just a matter of waiting till it crashes before I move onto the next step. Ugh.
 
I'm not 100% sure that I can rule the RAID controller out, but seeing as it was a $1000 RAID card, I have my doubts.
And $50,000 servers can crash and burn just like your $500 workstations can. Just because it costs alot of money, means nothing.


As I said, I can still access the RAID controller even after the OS freezes. I can check its logs, HDDs' statuses, and so forth.
Using that logic, you can still access the RAID controller after the OS freezes when you get back into Windows too.

My point is simply don't discount it until you've taken steps to rule it out.


Also, if this server is relied upon 24/7 and can't even be shut down for the WEEKEND, I'd be wondering why you don't have two of them to have one serve as a backup.
 
Also, if this server is relied upon 24/7 and can't even be shut down for the WEEKEND, I'd be wondering why you don't have two of them to have one serve as a backup.

Considering its a home file server, I'd either have to be nuts or rich to have a backup. I'm neither, so I don't. :D
 
Considering its a home file server, I'd either have to be nuts or rich to have a backup. I'm neither, so I don't. :D

So then it's really not used 24/7.
Do without it for a few days and just run it in Safe Mode to eliminate third-part software as being the cause.
 
Back
Top