Server Monitoring / Best Practices

fibroptikl

Supreme [H]ardness
Joined
Mar 9, 2000
Messages
7,339
I am at a new job, and one thing I have been asked to do, is come up with a list of "best practices". This list of "best practices" would contain items that could either be monitored by software (not necessarily which software, but just in general) or could be manually checked. Our system's usually consist of an Mircosoft SQL 2005 database, a few custom written services, and some client software (all can be on the same or 3 different computers) running on Windows Server 2003.

Our standard list right now consists of basic's, such as.

Critical Updates
Service Packs
Event Viewer / Log monitoring
Checking to make sure it backed up.

I've added things such as looking at the database files integrity wise, things with the disk drive, etc.

So my question is; for those of you who work in a computer field, and are assigned the duties of monitoring a server or system, what is it you look for or do?
 
There are three different goals you may want to look into:
- Baselines of the servers
- Healthcheck of the servers
- Regular maintenance

You base line should be looking at what normal CPU, Mem, Disk, Network, etc. traffic a server recieves. You can then use that to detect deviations from the norm. There are many packages that can automate this for your. Nagios (open source/free) and Orion (closed source/not free) are two that I have used.

Your health check should include things like patches needed, backup procedure testing, event log checking, fragementation levels, etc. The point here is this is just documentation of what needs to be done.

Your regular maintenance should include things like SQL maintenance, OS maintenance, patch installs, backup procedure testing, fault tollerance testing (clusters), etc. This is where actual changes are made.
 
A network monitor of some sort would be a good idea. Anything that tells you the status of the routers and switches in your network so that when connectivity goes bad, you can pinpoint it immediately.

Also, some program that will show you, in GUI form, all of your servers and whether or not they're up, having problems, or down. This way if you have further problems, you can see what it is at a glance.
 
Barstool said:
A network monitor of some sort would be a good idea. Anything that tells you the status of the routers and switches in your network so that when connectivity goes bad, you can pinpoint it immediately.

Also, some program that will show you, in GUI form, all of your servers and whether or not they're up, having problems, or down. This way if you have further problems, you can see what it is at a glance.
Nagios and Orion are two of the best I have found.
 
Back
Top