Virtual Server instability

joblo37pam

2[H]4U
Joined
Jun 28, 2002
Messages
2,211
I'm about to pull my hair out on this one.

I am having problems with a couple of my virtual machines. Here's the setup:

- Host Server: HP Proliant dL360 G4p (2x Dual Core Xeon 3.6Ghz, 8GB RAM, 2x300GB SAS in RAID1, Server 2003 R2 SP2, Virtual Server 2005 R2)

- VM 1: Server 2003 R2 SP2 - Hosts Altiris for imaging and app deployment.

- VM 2: Server 2003 R2 SP2 - Hosts Symantec Server 3.1, IAS for wireless RADIUS, WSUS 3.0, Cert Auth.

- Each VM is configured with 2 fixed size virtual disks (OS and Data)

The whole setup was running great for over a year with no problems. Now all of a sudden, the two virtual machines reboot, seemingly at random. The host is rock solid, though. If I turn off automatic restarts on failure, once in a while I will get a blue screen with 0x00000024 or 0x0000007e. These are both possibly drive related, but since these are virtual disks/controllers, I don't know what to do about it.

I'm not sure what changed to make this happen (I'm not the only person who uses them), but it's getting really frustrating. Luckily they reboot quickly, so interruptions are minimal, but I need to get it fixed.

Here's what I've tried:

-Update all three boxes (Microsoft Update, LiveUpdate, etc.)
-Ran chkdsk on the .vhd files and the physical partition that they are located on.
-Attempted an OS repair on VM2. It would just quit and restart during the process, never finished.
-Turned off AV real-time scanning and tamper protection on all 3 boxes
-Uninstalled/reinstalled third party apps
-Applied KB937455 which is supposed to deal with 0x00000024 error.

The problem seems to be more prevalent when traffic is high, but isn't always reproduceable on command. I'm about out of ideas, aside from calling Microsoft. Does anyone here have any suggestions? Thanks in advance.
 
Have you updated the drivers for the host system as of late? Maybe a new raid driver or nic driver is messing with virtual server.
 
7E stop errors are almost always driver/hardware related. Try updating drivers on the VMs and the host.
 
Normally I would agree with the driver/hardware suggestion, but since it is the virtual machines that are having problems, all of the hardware is 'virtual'. It's not possible for the hardware to go bad, and the drivers are all generic. The host is perfectly stable, so I find it hard to believe that it has any hardware or driver problems, but I can update them anyway. I will try it tonight when everyone leaves.

Any other suggestions?
 
Normally I would agree with the driver/hardware suggestion, but since it is the virtual machines that are having problems. all of the hardware is 'virtual'. It's not possible for the hardware to go bad, and the drivers are all generic. The host is perfectly stable, so I find it hard to believe that it has any hardware or driver problems, but I can update them anyway. I will try it tonight when everyone leaves.

Any other suggestions?

The host drivers can scew with the vm. I've had issues where I had to update the controller firmware to fix an issue once before as well.
 
Run Prime95 or memtest on the host OS. Or underclock your CPU or RAM if possible and try again.
 
Bad memory. Had similar problems with VMWare ESX Server reboots...

Use memtest on the host machine. Make sure drivers are up-to-date...
 
I doubt its overclocked, given that its a dual xeon machine. But I'm with you 100% on the prime and memtest.

You're right, it's not overclocked. I doubt the board even supports it, but I haven't checked. I can't take it down during the day to run memtest, but I started a prime95 run on the host to see if it would turn anything up. I haven't used prime95 for a couple years. Do you still have to start multiple instances to use multiple processors?
 
I don't have any virtualization experience outside of VMWare, but I have seen issues in ESX where the host was rebooting or purple screening (so much prettier than blue screening) for now apparent reason. There was a BIOS update for the Host machine, and applying it resolved the problem.

I'd start with your host OS even though the problem appears to be your guests. If both guest OSes are seeing the same issues with no other commonality among them, the host seems to be the best bet.
 
I finally got the new RAID drivers installed last night. VM2 has been up since then, but VM1 has restarted twice.

Prime95 is running now, but hasn't hit any errors yet.

I may be able to take it down for a while tomorrow lat afternoon to try memtest for a while. If so, I will update BIOS at the same time.
 
Well, I got the new BIOS installed over the weekend, and let memtest run for 18 hours. Memtest didn't turn up anything, and the vms are still restarting. I installed new firmware for the nics, too. Nothing seems to make any difference. Prime95 ran for a day on the host with no errors. The hardware seems to be solid. I'm at a loss here. Any other suggestions?
 
Move one of the vms to another machine. Even if you have to throw it on a desktop or something. That should help say if it is an issue with something you installed in the virtual os or if it is related to the server you are running it on.
 
are4 you getting a memory dump ? The memory.dmp file can be anaylyzed to figureout what driver is causing the issue.
 
I don't have another machine available at the moment that I can throw one of the vms on, but I will work on it and see what I can dig up.

As for the memory dumps, it's not throwing any. I have configured it to collect a "complete memory dump", but after the restarts, there is no memory.dmp file in the location specified.

I installed new drivers to go with the new firmware on the nics tonight, we'll see how that goes.
 
Again, I'm not too well versed in Virtual Server 2005, but would it be possible to mount and run your VMs in VMWare's Virtual Server as a test (it's free)? You should be able to run both of them on the same host OS. It can help to narrow down where the problem is (the host OS, the guest OSes, the application, drivers, etc.)
 
Can you mount MS virtual machines in VMWare? I've never tried it, and it's been a few years since I've used VMware.

The new network drivers didn't do the trick, BTW. VM2 restared 4 hours after install. VM1 is still running.
 
No, you cannot mount a MS VM on VMWare. They are not nearly compatible.

Have you tried removing VM additions? Those are the closest you will get to troubleshooting the VMs drivers.
 
If invalidating the test equates to solving the problem, I don't think that's a bad thing.

I'm going to let myself be guilty of a little fanboy loyalty here and say that if you were to try out converting your MSVMs to VMWare VMs and run them in Virtual Server without issue, that might be a sign to switch vendor solutions... but maybe you have a reason for using MS's product.

Either way, good luck to you.
 
Back
Top