ESXi vCPUs multiprocessor limits?

rtangwai

[H]ard|Gawd
Joined
Jul 26, 2007
Messages
1,369
Someone offered me a "real" server to replace my whitebox AMD FX-8350 running ESXi 6.5.

It is a Supermicro X8DAi w/2x Xeon X5650 hex-core hyperthreading CPUs.

What I am curious about is how ESXi vCPUs work - can I create a VM that uses all 24 threads as a single vCPU with 24 cores, or is ESXi limited to the processors themselves so the max each vCPU can have is 12 cores?

If it can do such a thing is there a performance penalty over and above the usual ESXi overhead?
 
I'll let others comment on the AMD vs Intel.
BUT

Intel = more cores which means more virtual to physical ratio
Intel = more energy efficient per socket
AMD = higher clock which is needed by some applications
AMD = more overall efficiency per system

The way we allocate at work is -2 off the top of physical cores for esxi overhead (also to keep the core count even). So you would not want to exceed 10vcpu on your VM or 11 if its thread intensive. But yes, if you absolutely had to, you could do 12 and may have a penalty of some % of performance. Whether you notice it or not depends on how busy the VM/s are.

You can certainly over-allocate the physical vCPUs if the VMs are A. not all busy at the same time or B. small enough.

The hyper-threading comes in when over-allocating. They can help the scheduler and (what we do at work) is figure them for about 30-50% of a physical core performance wise. We never allocate HT cores though.

If you want to get crazy, allocating across both sockets, you can start taking about numa nodes and efficiencies BUT being in a home lab is not something I would necessarily worry about unless you are getting crazy with whatever you're doing.

Hope this makes sense. I know I can explain it better but for some reason, doesn't seem to be coming out right in text form.
 
Last edited:
Read up on CPU contention, that is the typical issue you run into with over subscribing CPU's

Also, ESXi free 6.5 will only let you allocate 8 vCPU to any one VM.
 
Someone offered me a "real" server to replace my whitebox AMD FX-8350 running ESXi 6.5.

It is a Supermicro X8DAi w/2x Xeon X5650 hex-core hyperthreading CPUs.

What I am curious about is how ESXi vCPUs work - can I create a VM that uses all 24 threads as a single vCPU with 24 cores, or is ESXi limited to the processors themselves so the max each vCPU can have is 12 cores?

If it can do such a thing is there a performance penalty over and above the usual ESXi overhead?
VMware best practices say to assign CPUs by sockets. Only use cores when dealing with licensing limitations of software running on the VMs. Also, only assign the minimal amount of vCPUs needed for the virtual servers rather than all to each VM. Start with 1 socket and increase as needed for the workload. This improves ready time.

When dealing with 2 or 4 socket servers, assign based on sockets allows ESXi to better split the workload across NUMA nodes. If you have to assign cores rather than sockets, then assign sockets based on the server layout (2 or 4 sockets) then increase core count from there.
 
Thanks for all the advice, this is my home ESXi server so I am in a position to experiment with it to see what works best.

The whole CPU vs. core thing came up because I was having a discussion with a colleague who was explaining to me that software has to be specially coded to be aware of and use multiple CPUs. I had asked him "what if you faked it out using a hypervisor?" and he gave me a blank stare :)
 
Thanks for all the advice, this is my home ESXi server so I am in a position to experiment with it to see what works best.

The whole CPU vs. core thing came up because I was having a discussion with a colleague who was explaining to me that software has to be specially coded to be aware of and use multiple CPUs. I had asked him "what if you faked it out using a hypervisor?" and he gave me a blank stare :)

Your not faking it out. Virtualization presents virtual resources to the guest VM as normal hardware. So a core is still a core no matter if its by sockets or core count.
 
Your colleague is correct. Below doesn't matter if it's on Hypervisor or not. In most cases, the OS doesn't even know it's virtualized.

If you have software that is single threaded (single cpu), having 10 cpus allocated isn't going to help it. Higher clock speed will.
If you have software that is multi threaded (multi cpu), having 10 cpus could be greatly beneficial. Clock speed may OR may not be as important here but general rule of thumb- buy highest core count with highest clock you can.

Example:
At work, we had a team complaining their app was using 100% cpu and demanded more be given to the VM. I reviewed it but could only see the VM maxing out at 50%. I asked them if they were single or multi threaded. They were single, so there is nothing we could do from infrastructure. They need to make their app scale horizontally (add more single core VMs) or re-code to support multi threading.


In my case, I opted to replace my dual quad (wo/HT) 2.4Ghz for dual hex (w/HT 2.0Ghz). I did this because they are lower power and am able to have a higher over allocation ratio/higher single vm core count BUT I lose some performance by going with a lower clock. With the applications I use, I haven't noticed any performance loss. Depending on your use case, this may be the same.

Hope this helps.
 
Ok, question about sockets and cores when I'm assigning a count for machines. I've got a free 6.0 ESXi box I run some stuff on. It's got a pair of 8 core CPUs. If most of my VMs are quad-core, I should be assigning them as 1 core x 4 sockets in their config? Or 2 cores x 2 sockets? They're all right now 4 core x 1 socket configs.

BTW I'm running like, 10VM's in the ~260% provisioning right now. But I hardly hit even 50% CPU utilization of the host.
 
Ok, question about sockets and cores when I'm assigning a count for machines. I've got a free 6.0 ESXi box I run some stuff on. It's got a pair of 8 core CPUs. If most of my VMs are quad-core, I should be assigning them as 1 core x 4 sockets in their config? Or 2 cores x 2 sockets? They're all right now 4 core x 1 socket configs.

BTW I'm running like, 10VM's in the ~260% provisioning right now. But I hardly hit even 50% CPU utilization of the host.

This directly relates to numa nodes. Here's a few docs from VMWare's site about it.
What is a NUMA node
The Importance of VM Size to NUMA Node Size
Virtual Machine vCPU and vNUMA Rightsizing – Rules of Thumb

Bottom line, keep your VMs sized how you have them (4core x 1socket). Otherwise, leaving as default (1core per socket) is fine as long as you do not exceed the physical core count but VMWare is already trying to put your VMs into the same numa when possible out of the box. You are doing this manually.

Based on the information given, with a single host, your 260% is 2.5vcpu for every 1pcpu, I would be more concerned about contention and ready if you are that over provisioned with such a low usage. If you verify you contention/ready are minimal (<~5%). Then no worries. If they are pushing the ~10%+ range, you may want to look at downsizing your VMs that don't need 4vcpu.
 
Ok, question about sockets and cores when I'm assigning a count for machines. I've got a free 6.0 ESXi box I run some stuff on. It's got a pair of 8 core CPUs. If most of my VMs are quad-core, I should be assigning them as 1 core x 4 sockets in their config? Or 2 cores x 2 sockets? They're all right now 4 core x 1 socket configs.

BTW I'm running like, 10VM's in the ~260% provisioning right now. But I hardly hit even 50% CPU utilization of the host.
Each socket should be 1 core. if you need 2 vCores, assign 2 sockets with 1 core each.

I mentioned earlier, VMware best practices documents say the only time to use increase the core count per socket is if you are running an application that has licensing implications with sockets. If that is the case, for NUMA purposes, you want to run the amount of sockets as you have the system then increase the core count until you get where you need to be.
 
Last edited:
Please, forgive my igonrance, but what I am reading here is that best practices for VMWare is to assign VMs sockets (1 core per socket) when designing how many total cores are needed?

IE:
1 Socket, 4 cores per socket = BAD
2 Sockets, 2 cores per socket = BETTER
4 Sockets, 1 core per socket = BEST
 
You have to be careful of what you subscribe due to licensing too. Mr. Baz there is not a sliding ruler for this stuff. It depends on all of the variables in the equation.
 
You have to be careful of what you subscribe due to licensing too. Mr. Baz there is not a sliding ruler for this stuff. It depends on all of the variables in the equation.
For enterprise licensing, it doesn't matter (at least for me). I understand what you are getting at. 2 sockets nets you different licensing than 4 sockets for Windows Server, for example. Didn't MS just change their licensing schema last year?

All licensing aside, how does VMware performance get affected by these settings? Say you have a host with 2 x 16 core Xeons. I have a VM that I want to have 4 vCPUs. Will my performance be better with a 1x4, 2x2, or a 4x1? According to some of the articles I was reading, it seems with NUMA aware systems like VMware, you could have any combination as long as you don't overextend your total core count and your vCPU count per VM should be a multiple of the host CPU (IE, in this case you could have 1, 2, 4, 8, or 16 vCPUs per VM).
 
For enterprise licensing, it doesn't matter (at least for me). I understand what you are getting at. 2 sockets nets you different licensing than 4 sockets for Windows Server, for example. Didn't MS just change their licensing schema last year?

They did, 2016 server licensing sucks hard now. Its licensed on core count of the actual host. So assigning more cores doesn't really matter at this point, only in Datacenter edition really, Standard is licenses by core count of the host.
 
Mr. Baz - it short, in my testing it netted no difference unless you start to talk about NUMA nodes.
k1pp3r - It does suck the hard now. Makes home pc licensing harder for servers too. M$ needs their monies.
 
VMware best practices say to assign CPUs by sockets. Only use cores when dealing with licensing limitations of software running on the VMs. Also, only assign the minimal amount of vCPUs needed for the virtual servers rather than all to each VM. Start with 1 socket and increase as needed for the workload. This improves ready time.

When dealing with 2 or 4 socket servers, assign based on sockets allows ESXi to better split the workload across NUMA nodes. If you have to assign cores rather than sockets, then assign sockets based on the server layout (2 or 4 sockets) then increase core count from there.

not true anymore. By VMwares own admission, back in 5.5 i believe. There is absolutely no difference, in regards to performance, unless you need to get around some licensing per CPU reason to actually pick one over the other. 1 socket 12 cores, vs 2 sockets 6 cores, or other. As long as you adhere to NUMA, doesnt matter. There are still some folks who swear it has to be this way, but IMO just manage it however you feel like it. You arent doing it wrong either way. Maybe we are saying the same thing, and im just leaning more to the it really doesnt matter which.
 
Last edited:
The accuracy of the info here for VMware and ESXi is all over the place.

So let to throw in my 2 cents...

vCPU allocation has changed in the past couple of versions of ESXI. The latest best practice for ESXi 6.5 is to NOT touch cores and to allocate sockets only. ESXI 6.5 doesn't do what you think either. It actually will auto recalcuate the correct socket/core count and apply it correctly in the backend...BUT ONLY if you just add sockets and ONLY if the VM was created in ESXI 6.5. Migrated VMs are treated differently. And they are also treated differently depending on what version of ESXi they came from.

My advice for the anal VMware administrators for all versions of ESXI before 6.5 is to follow the hardware's physical topology for the maximum number of sockets and cores per socket. The exception to that rule is if you have migrating a >8 core per socket VM into a ESXi 6.5 cluster. As it may crash the VM upon migratiion.... something that was documented in the foot notes in ESXI 6.5 U1


In that one instance, lower the core count to 8 per socket, perform the migration, make a snapshot or clone the VM, upgrade the virtual hardware, change the core/socket configuration to either best practice or the physical topology, restart the VM, verify the VM and related services. Delete the snapshot/ or clone when you are certain everything is working properly.


Newer versions of ESXi are far more forgiving on socket/CPU configuration than older versions.


As to oversubscription:

At low density the basic math is 1 core = 1 vCPU you can use.

A hyperthread is generally equal to 0.2 of a physical core, ONLY if the physical cores are not thread locked.

Ex. Decacore processor with hyperthreading can usually give you 12 usable vCPUs, this also assumes you are not threadlocking

The concept of virtualization is that you never use all your CPU cycles all the time.

If you end up using 99% of your physical cores when virtualized you'll notice that the host shows 50% usage. Add 2% more virtual load and all those hyperthreads will engage and the host will show 100% usage. The false assumption is that you still have 51% of your host's CPU available, Hyperthreading is only useful when you have intermittent loads. You can have a dynamic load over 100% and achieve very high efficiencies using virtualization as long as you do not lock the physical cores with threads.

Virtualization does not make something from nothing (and most of you know that)

CPU allocation is also configurable. By default the priority order is it is 1 physical core, 1 hyperthread. You can change this to allocate all the physical cores first if you desire.

NOTE: If another physical core is available it will be used instead of the hyperthread.


Use ESXTOP Keep an eye on: vCPU ready, VM World time, DRS Entitlement demand ratios, and DRS entitlement/demand delivered and you'll then know how many VMs you can load onto your system.


I've run systems at CPU over subscription rates of 3:1 and as high as 7:1 with full entitlement delivered and no slowdown in VM world time.


This is one topic that the perfect answer is test, verify and monitor as needed.


Lastly, DO NOT forget to watch the storage latency values in ESXTOP.
 
not true anymore. By VMwares own admission, back in 5.5 i believe. There is absolutely no difference, in regards to performance, unless you need to get around some licensing per CPU reason to actually pick one over the other. 1 socket 12 cores, vs 2 sockets 6 cores, or other. As long as you adhere to NUMA, doesnt matter. There are still some folks who swear it has to be this way, but IMO just manage it however you feel like it. You arent doing it wrong either way. Maybe we are saying the same thing, and im just leaning more to the it really doesnt matter which.
In 2013 VMware recommended setting sockets and leaving core count at 1 unless you have licensing limitations. [SOURCE]

VMware recently said that with the changes to Windows licensing that you can increase cores as long as you do not assign more cores than the CPU physically has. [SOURCE] The original 2013 post still holds true though. The recommendations now are based on licensing which was mentioned in the original post.

I was not aware VMware is changing it so VMs created on 6.5 automatically try to fix numa for you, but that doesn't fix older machines that are improperly configured since they are not corrected when migrating them to 6.5. There are known issues when migrating systems that have invalid cores per socket (more than 1) to 6.5 as pointed out [SOURCE] (a source used by VMware, so it is legitimate).

A few other comments that apply here...
Migrating VMs configured with Cores per Socket from older ESXi versions to ESXi 6.5 hosts can create PSODs and/or VM Panic ... The Cores per Socket configuration overwrites the default VPD configuration and this can lead to suboptimal configurations if the physical layout is not taken into account correctly... In an ideal world, the thread that accessed or created the memory first is the thread that processes it. Unfortunately, many applications use single threads to create something, but multiple threads distributed across multiple sockets access the data intensively in the future. Please take this into account when configuring the virtual machine and especially when configuring Cores per Socket. The new optimization will help to overcome some of these inefficiencies created in the operating system...
[SOURCE]

CPU core settings were originally hidden for a reason. It seems VMware is walking back making that available by automatically trying to make VMs properly configured if created in 6.5.

VMware has said all along that you can set cores per socket, in the 2013 post best practices guide, but you have to follow numa layout, which I mentioned earlier in this thread. It's just easier to tell people to use sockets only because not everyone has a good understanding of hardware (if someone has to ask the question the answer should be easy for them to understand )

I also try to keep in mind that as you add hardware to your environment, the socket/core configuration are going to change. I don't want to have to go back and keep fixing cpu allocations on VMs.

Ultimately this is a six of one, half a dozen of another scenario. There is no 'right' answer, there is only educating ourselves to know what's best for each environment (which will change over time). I still want to read more on how CPU cache is affected by sockets in a VM. My brain just hurts too much right now to do anymore reading.
 
Last edited:
Back
Top