Virtualization Performance Monitoring: vCPU Demand, Usage, and Contention

Written by Joe Kozlowicz on Wednesday, June 13th 2018 — Categories: Azure, Cloud Hosting, VMware

When managing a virtualized environment you’ll naturally want to monitor your compute resources such as memory, CPU, storage, and bandwidth in order to keep an eye on any possible performance issues.

We’ve covered monitoring before – like how much information to collect, how granular you need to get, how to check load averages, and configuring vSphere Alarms for resource consumption. Today we’re taking a closer look at CPU performance monitoring in particular.

Often times the CPU is the first potential culprit to check when you encounter a struggling virtual machine. Learn the differences between CPU metrics, some common problems, and best practices for provisioning CPU cores in this blog.

 

Demand vs. Usage

When you check your CPU metrics within vSphere or your performance monitoring platform of choice, you’ll likely be looking at the usage percentage. This seems pretty clear cut — it is a measurement of the CPU time used by a VM.

CPU demand is highly correlated with CPU usage, but it’s not quite the same thing. Demand is what is requested by the VM. In some cases, what the VM is demanding is not always what it is receiving.

CPUs can generally reach up to 80% utilization as measured by CPU usage with no problems. It can also peak beyond 80% without too much of a performance hiccup. Where you see issues is when the baseline load reaches above 80% consistently or when the machine is performing poorly despite usage measured below 70%.

When this occurs, you’ll want to check the application type and your CPU configuration. Some apps can not take advantage of multi-thead processing, so the available virtual CPU cores are not being used properly. Make sure the workload is distributed among all available cores if possible. In other cases, the VM may have a CPU limit.

Of course the most obvious answer is usually simply to provision additional vCPU resources for the VM in question. Make sure to reboot the VM afterwards. For mission critical workloads, you can hot-add the vCPU if you have the option enabled, and then reboot outside of business hours.

 

When vCPUs Fight

One of the criticisms of public cloud is that your workloads can potentially duke it out with other hosts. In a properly configured environment, this should never happen. And indeed it can happen even with private clouds or on-premise virtualized environments.

The problem is CPU contention. The hypervisor is what directs each virtual machine to its physical resources on the actual servers. You can overprovision those resources, but if workloads are running at the same time, are highly resource intensive, and/or you have a very high ratio of hosts to physical CPUs, then each virtual machine ends up fighting for – and sometimes stuck waiting for – the available CPU cycles. The hypervisor must decide and prioritize which workload is sent to which physical resource.

This is called a “high ready time” as you are waiting a long time for your actual CPU to be ready for your workload. A good rule of thumb to avoid it is to provision as many cores as you need for a maximum workload on as few vCPUs as you can get away with. Keep your overall VM size smaller if possible – add another VM if you need additional resources rather than provisioning a large quantity of large VMs.

Native VMware tools like EXSTOP, vSphere, and vRealize Operations Manager include metrics for ready times. Ready times / CPU contention are measured per-core and performance generally degrades above 10%.

Co-Stop is another metric used for this situation, but only for VMs with multiple assigned vCPUs. It measures the amount of time a process is delayed due to CPU contention. If the co-stop is above 3, then you likely have too many vCPUs and the Symmetric Multi-Processing utility (vSMP) is overtaxed. Snapshots can often lead to high co-stop values as well.

 

As you can see, CPU contention is the cause for any discrepancy between CPU Demand and actual CPU Usage. To solve the problem, reduce the provisioned CPU cores, as long as you remain above the overall Demand threshold. You can also add additional physical CPU resources to your VM pool.

Chat Now