First fault in 3 years, purple screen - time to upgrade vSphere?

Sp33dFr33k

2[H]4U
Joined
Apr 20, 2002
Messages
2,481
After more than 3 years of flawless service our vSphere 5.1 cluster had a single host purple screen. VMware support has stated they'd rather not have to support such an old hypervisor. We've been holding off upgrading due to how well the system has worked since deployed. We're looking to move to ESXi 5.5u3 to match our vCenter server.

Looking for experiences running 5.5 u3 and things to look out for. I know there's been on and off issues with NIC drivers (e1000 vs. VMXNET3). Just don't want to go from a cluster with 99% uptime to something that fails often.
 
Is it just one host thats purple screening? my understanding is thats a hardware failure?
 
I would think you can at least send VMware the core dump file and they can tell you what caused the PSOD. They are simply just reading the file.
 
Is it just one host thats purple screening? my understanding is thats a hardware failure?

It was just one host, the purple screen info was related to this article:
https://kb.vmware.com/selfservice/s...ype=kc&docTypeID=DT_KB_1_1&externalId=1020214

There's an article related to HP servers:
Some HP servers experience a situation where the PCC (Processor Clocking Control or Collaborative Power Control) communication between the VMware ESXi kernel (VMkernel) and the server BIOS does not function correctly.

As a result, one or more PCPUs may remain in SMM (System Management Mode) for many seconds. When the VMkernel notices a PCPU is not available for an extended period of time, a purple diagnostic screen occurs.


This issue was fixed in a much older release of vSphere than we have but could be a similar thing.
 
I would think you can at least send VMware the core dump file and they can tell you what caused the PSOD. They are simply just reading the file.

We've done that. The engineer we got was quite difficult to deal with. Haven't heard anything back from them yet. Server has been fine since and probably will be.

One thing we're going to do is reboot the hosts every 6 months or so just to clear things out. These hosts were up for more than 540 days. Last time we rebooted them was for an HP agent patch.
 
5.5u3a fixed most of the issues and that's what we've been running for the past few months. It's usually vSphere updates that cause me the biggest headaches.

But if your host have been up for 540 days, what build are they currently on? I'd patch them up to 5.1u3 at least, should be fine to go all the way to Patch 8 though.
 
Hosts are running 5.1 base release. 5.1 ends general support this year and support guidance in 2018 so it is time to move on. This has literally been the only fault we've had since the cluster was deployed which has kept us from messing with it. We'll use the HP 5.5 custom image and test it on one host for a week or so before upgrading all of them. vCenter is already upgraded to the latest release of 5.5.
 
HP renamed one of their network drivers between 5.1/5.5 builds if I remember correctly. It made it so you couldn't upgrade the hosts using their new image if you deployed with an old one without renaming the driver in a custom image. I ended up reinstalling all of my hosts rather then upgrading when we went to 5.5.
 
HP renamed one of their network drivers between 5.1/5.5 builds if I remember correctly. It made it so you couldn't upgrade the hosts using their new image if you deployed with an old one without renaming the driver in a custom image. I ended up reinstalling all of my hosts rather then upgrading when we went to 5.5.

Thanks, that's a nice bit of info to have. Weird that they would do that. Don't recall seeing any issue with the HP image when I did this at my last job but I'll keep it in mind.
 
Back
Top