Server 2003 memory leak horror...

Linuxtim

Limp Gawd
Joined
Feb 26, 2003
Messages
203
Hi
Got a big problem with Server 2003 randomly crashing. It just stops talking to the network and requires a reboot. Event log is full of 1009 errors than a few 2019 errors (can not page from non-paged area I think) and then it stops reponding to network traffic.

Things is, this is effecting at least 15 of 53 servers and it seems f*cking random. Also it seems about 1 new server a week starts doing this when it has been fine for 3 or so months. All server 2003, not SP1 though (I could not see any reason to upgrade to SP1 - correct me if needed!).

Servers run DHCP, DNS, AD, McAfee 8.0 enterprise (with patch 11), Veritas 9.1 or 10 (fully patched) and updated to the latest Intel Pro 1000 MT drivers (10.2 I think) but I still see 1009 errors (although a few less 2019s that hang boxes).

So - anyone else seen these sorts of errors on server 2003 and any idea how to fix it? It's starting to get desperate! We just upgraded from NT4 and find that server 2003 seems more unstable! Our users are starting to get concerned.

Cheers
 
If you want any real help your going to need to post more details.

Helpful information would include *all* details when referring to event logs and that includes errors and warnings from application and system logs.

What makes you think it is a memory leak?

Have you even considered contacting Microsoft PSS?

I would think if this is happening on so many machines you would want to solve the issue ASAP rather than rely on forums which could take days.
 
SJConsultant said:
Have you even considered contacting Microsoft PSS?

I would think if this is happening on so many machines you would want to solve the issue ASAP rather than rely on forums which could take days.

I agree if 15 of 50 some servers are affected, you really should be talking to Microsoft.
 
If you know it's on a cycle, watch resources as the time goes on and look for the memory hog (lsass?). Try restarting some services and see what reduces the memory usage. When you find the one, it should be pretty obvious. ;)

I've seen a couple of Symantec products do this (ITA and AV) and we ended up setting up a script to bounce the service every night until a fix was made. Good luck!
 
can you post the two errors?

on one of the questionable servers wait a day. then open task manager.
customioze the view to include:

User Name
Paged Pool
Non-Paged Pool
Handle Count
Thread Count

Now look at most services.
svchost typically uses 1424 handles and my outlook 1333. I'd be looking for the process thats taking way too many tread handles.i say you might see one that is over 100,000 before the server crashes. You should be able to identify the source of the leak and move from there.
 
Linuxtim said:
All server 2003, not SP1 though (I could not see any reason to upgrade to SP1 - correct me if needed!
I would try update one of them to see if it fixes the problem. SP1 fixed a number of small bugs. It's definitly worth a shot.
 
I would look at your backup in addition to some of the other recommendations.
See the logs in backup exec and look for errors there and also see if your backups are running when people are trying to use the network.

Let us know, and also give Microsoft a call.
 
Back
Top