vCenter fun

Joined
May 22, 2006
Messages
3,270
In my lab I have 3 domain controllers: 2 running Windows 2012 and a brand new one running 2012 R2. Last night I decided I wanted to get all of them up to 2012 R2 so I moved the FSMO and Schema master roles to the existing 2012 R2 machine, demoted the first DC, removed it from the domain, and proceeded to rebuild it.

Right after the demotion I realized my DNS wasn't working. Strange because while the DC I just took down was usually the primary DNS server for my network, everything should use the secondary DNS server. Then I saw my vCenter client showed everything as inaccessible and it eventually crashed. I couldn't reconnect to vCenter anymore and figured it was a DNS problem.

I figured once I got AD and DNS up and running on that DC again, everything would be fine. So I got the DC rebuilt and reconfigured and attempted to log into the vCenter client again. No dice. Tried the IP but got the same result. "Unable to connect to remote server." Then I tried the Web client and while I could get in with my AD creds, I was greeted with an empty inventory and the message:

"Could not connect to one or more vCenter Server systems: https://192.168.0.13:443/sdk"

Looking at my vCenter server appliance's config page, I see that the vCenter service is stopped. I reboot the VCSA but no go. I manually start the service and while it appears to run, within a few minutes it shows as stopped again. I removed and re-added my domain as an identity source in SSO, but no go.

I'm not finding any help in the KB or forums. Seems I've broken new ground! If I wasn't using distributed switches in my cluster, this wouldn't be a big deal since I could just build the VCSA again. I do have a backup of my VDS so I could still do that, but would rather troubleshoot and fix the issue, if possible. Anyone seen something like this?
 
Yeah, saw that KB but since I'm using the VCSA it doesn't really apply.

What I'm trying right now is backing up the postgres DB on my old VCSA, deploying a new appliance, and restoring the DB.

I could log into vCenter again but only one of my hosts will successfully reconnect. The others time out at 89% and then my vCenter session crashes.
 
Wow. I can add host 3 to my restored vCSA but as soon as I try to add host1 or host2 it crashes and I can no longer read data from host3.

Next I tried installing a brand new vCSA without importing the database. I do have a backup of the VDS so I planned on importing that but, again, I cannot connect host1 or host2 to the vCSA. If I do, vCenter becomes completely unusable even after a reboot.
 
Wonder if it's the distributed switch causing issues, especially with vm3 not working when you try to add 1 and 2. Can you reset the networking on one of them from the console and get it to connect? I used to have to do that when I'd have a 1000v go haywire on me.
 
After connecting to each host directly with the client, removing the vpx user, and telling each to disassociate from vCenter, I was then able to add them to a fresh vCSA. Now I've restored the VDS from a backup and am back in business.
 
that is WEIRD as hell
I've not seen that since EARLY early beta for 5.1...
 
that is WEIRD as hell
I've not seen that since EARLY early beta for 5.1...

It could be an issue with how my domain is set up, too. I'm not the sharpest tool in the box when it comes to AD.

Still, I didn't expect vCenter to totally take a dump on me like that. I'm just glad I had the VDS backed up so I didn't have to play "juggle the network adapters" to get the hosts on a new vCenter.
 
It could be an issue with how my domain is set up, too. I'm not the sharpest tool in the box when it comes to AD.

Still, I didn't expect vCenter to totally take a dump on me like that. I'm just glad I had the VDS backed up so I didn't have to play "juggle the network adapters" to get the hosts on a new vCenter.

What's the DNS forwarding rules set to in AD?
 
What's the DNS forwarding rules set to in AD?

DNS forwarders are set to my ISPs DNS servers and OpenDNS servers.

Today the vCenter appliance just stopped working. No clue why. It appears DRS may have VMotioned it this morning but it was an hour after that when I my legacy client connection crashed.

Upon connecting to the web client it says it can't reach the inventory service. I go to the VCSA setup page and see the vCenter and Inventory services are stopped and the database status is just a spinning circle.

I tried rebooting gracefully but after 30 mins it was stuck trying to shutdown. A hard reset results in it getting stuck at "Initializing SMTP port." I can ping the VCSA but that's all.

Not getting the warm fuzzies with the VCSA 5.5.0b. I may have to switch back to a Windows installation.
 
Very strange, I cant piece this one together in my head.

Thoughts:
1. Were the forward and reverse lookup zones replicating between AD/DNS servers? Using "allow zone transfers from name servers list" for each zone. (everything should still work though even if not.)

2. Is the management NIC of vCenter on a distributed switch portgroup set to static binding? I've run into issues before where you power off vcenter and power it back on a different host but it's seen as a new VM for some reason so it cant get it's statically assigned port on the distributed switch and it's stuck in limbo as vCenter isn't up. I then i have to find it's way back to the original host to power it on correctly.
 
Very strange, I cant piece this one together in my head.

Thoughts:
1. Were the forward and reverse lookup zones replicating between AD/DNS servers? Using "allow zone transfers from name servers list" for each zone. (everything should still work though even if not.)

2. Is the management NIC of vCenter on a distributed switch portgroup set to static binding? I've run into issues before where you power off vcenter and power it back on a different host but it's seen as a new VM for some reason so it cant get it's statically assigned port on the distributed switch and it's stuck in limbo as vCenter isn't up. I then i have to find it's way back to the original host to power it on correctly.

Yes, AD and DNS is replicating fine. vCenter NIC is on a vDS but comes up just fine. I can ping, get to the management web page, etc. of the vCenter VM no problem.
 
Found the issue. Nothing wrong with vCenter. Turns out the host it was VMotioned to this morning has faulty hardware. Still trying to determine if the CPU or RAM is bad but it's definitely one of the two. If I deregister the VCSA from the host and register it on another, it boots fine. If I move it back to that host, it fails to boot.

Fun!
 
Found the issue. Nothing wrong with vCenter. Turns out the host it was VMotioned to this morning has faulty hardware. Still trying to determine if the CPU or RAM is bad but it's definitely one of the two. If I deregister the VCSA from the host and register it on another, it boots fine. If I move it back to that host, it fails to boot.

Fun!

hahaha! The joys of hardware assisted virtualization - sometimes guests see the fun problems :) That explains a LOT more - whew.
 
hahaha! The joys of hardware assisted virtualization - sometimes guests see the fun problems :) That explains a LOT more - whew.

It also reminds me why I hate computers.

This one host freezes randomly. I have tried three different motherboards, 2 different PSUs, 2 different CPUs, 2 different sets of memory, even unplugging the single network adapter, box still PSODs. Right now it is running all different hardware from when the problem first manifested and it's still unstable.

Fuck this.... I'm drinking instead.
 
this vcenter is driving me crazy, i upgraded my home lab to 5.5 wrong move...

gonna try vCSA tonight T_T wasting so much time
 
It also reminds me why I hate computers.

This one host freezes randomly. I have tried three different motherboards, 2 different PSUs, 2 different CPUs, 2 different sets of memory, even unplugging the single network adapter, box still PSODs. Right now it is running all different hardware from when the problem first manifested and it's still unstable.

Fuck this.... I'm drinking instead.

I actually ended up recommending (on VMware letterhead - official as shit) sacrificing a goat to solve a problem like that once.

EMC seconded it.

Of course, I was gonna go with chickens before moving up to goats, but EMC wanted to go all the way.

Either way, vendor 3 replaced the steel chassis of a rack mount server, problem gone :)
 
It also reminds me why I hate computers.

This one host freezes randomly. I have tried three different motherboards, 2 different PSUs, 2 different CPUs, 2 different sets of memory, even unplugging the single network adapter, box still PSODs. Right now it is running all different hardware from when the problem first manifested and it's still unstable.

Fuck this.... I'm drinking instead.


What kind of PSOD??? also are you using VMXNet3 Adapters?
 
What kind of PSOD??? also are you using VMXNet3 Adapters?

I didn't record what the PSOD says.

However, I know it's the mobo or the CPU since I run memtest86+ in the bad host and it comes up with errors. Move the RAM to a different CPU and mobo and run memtest86+ and it goes several passes with no errors. If I try another set of known good RAM the same thing happens.

Either the mobo is defective or the CPU's DDR controller is.

A new CPU and mobo is only $110 at Microcenter so I'll just buy another set.
 
Cool, since u were mentioning Server 2K12 boxes was just checking to see if u hit a known bug with PSODing, thats why i was asking about the VMXnet as it affects hosts with machines running E1000 and E1000E very hard... Then theres also the other bug with E1000E and Data Corruption in High usage scenarios...

Just checking :)
 
Cool, since u were mentioning Server 2K12 boxes was just checking to see if u hit a known bug with PSODing, thats why i was asking about the VMXnet as it affects hosts with machines running E1000 and E1000E very hard... Then theres also the other bug with E1000E and Data Corruption in High usage scenarios...

Just checking :)

Yep I know about that one. Always use VMXNET3 anyway.
 
Back
Top