VMworld 2017 Roundup: Day 4

VMworld "hump day" is in the bag. This is the day that's sort of in the middle of VMworld and is distinguished by the lack of General Session in the morning and Customer Appreciation event in the evening. Let's recap.

vSphere 6.5 Host Resources Deep Dive: Part 2 [SER1872BU]

Last year was part one of Frank Denneman and Niels Hagoort's vSphere 6.5 Host Resource Deep Dive. This year they focused on CPU related concerns.

First up, Frank focused on non-uniform memory access, or NUMA for short. He pointed out that the best user experience is obtained by providing consistent performance. One of the ways of achieving that is by reducing "moving parts" (complexity). Also, don't forget that when troubleshooting performance in a virtual environment that you shouldn't only focus on the affected VM(s). Other VMs/components that share resources and/or hosts could be contributing or also affected.

There's an advanced setting, Numa.LocalityWeightActionAffinity , that you can consider if it turns out that CPU assignment isn't balanced well for your particular workload across NUMA boundaries. The default is 130 (but 130 what Frank hasn't found out yet). Set the value of this setting to zero to prevent NUMA load balancing. This is a per host setting, so it would have to be set on every host in the cluster.

Note that, as an advanced setting, this shouldn't be adjusted unless you've determined that you definitely have a performance concern that would be improved by setting this. The same is true for any advanced setting. Setting an advanced setting to something other than the default should be seriously considered, and with good reason, otherwise those wouldn't be the default values!

Frank pointed out the when you read PCPU in VMware's documentation that it refers to either a physical processor core or hyper-threading. There's no guarantee of one or another out-of-the-box as ESXi decides how to schedule CPU time. If a vCPU is scheduled to run on a physical core, it's considered to be charged for 100% of its CPU time. If a vCPU is scheduled to run on hyper-threading "core", it's considered to be charged 62.5% of its CPU time, meaning that it's immediately put back into the scheduling queue to satisfy the outstanding 37.5%. This is to make sure that if a vCPU is scheduled to run on HT that it ultimately gets the same CPU time as a vCPU scheduled to run on a core, however being scheduling multiple times does impact performance. The NUMA scheduler will balance workloads across NUMA nodes for VMs where vCPUs are less than the NUMA boundary. For large VMs (from a vCPU view), try to mirror the physical NUMA domains for absolute best performance.

The NUMA scheduler will use the CPU core count for client sizing as well. There is a per-VM advanced setting, numa.vcpu.preferHT that, when set to True will make the NUMA scheduler consider HT in addition to cores. This can keep a VM within a NUMA boundary but, like all advanced settings, you should have a good reason to change this behaviour for a particular VM. This may be good for VMs that can benefit from maintaining memory locality, but not for those VMs with high CPU load. See VMware KB 2003582 for more info.

The NUMA client always considers CPU cores, and not memory locality. The advanced setting, numa.consolidate , when set to False will help distribute based on memory. Did we mention being careful about advanced settings? Pretty sure we did.

Frank covered the impact of CPU sizing on storage performance, and how selection of storage for particular use cases (such as vSAN caching tier versus capacity tier) needs to be considered along with your CPU to ensure best performance. Hint: Intel Optane with 3D Xpoint makes for an excellent vSAN caching tier but is wasted on the capacity tier.

Neils then discussed the impacts of CPU utilization on network latency. Traditionally NICs rely on interrupts to have packets picked up by the CPU and put into memory. If the CPU is heavily utilized then interrupt generation can be affected and in turn introduce latency on the local NIC. A newer technology, vRDMA aims to address this by allowing NICs to directly access memory, bypassing the ESXi kernel entirely. This isn't broadly available yet as it requires newer hardware (or at least updated firmware) and, naturally, vendor support.

Oh, and, if you haven't already, don't forget to pick up the book!

Automating vSAN Deployments at Any Scale [STO1119GU]

This was a group discussion lead by Kyle Ruddy and Jase McCarty about automating vSphere and vSAN. They noted that PowerCLI is the most comprehensive platform for automating vSphere deployments, as many of the APIs aren't feature complete yet. Jase noted that they were surprised in the ways that clients were adopting and deploying vSAN. For instance, one client is deploying 2-node vSAN to 5,000 retail locations, and likely using the vSAN ROBO licensing to do so.

As an example of automating vSAN and vSphere, Jase mentioned a script that can assign a VM to a particular DRS host group once a specific tag was associated with it. This allowed control over VM placement within vSAN stretched clusters. One place to find PowerCLI scripts is the VMware {code} script repo.

Regarding vSAN deployment automation specifically, you need to pay attention to your hardware procurement and/or deployment decisions. Different hardware will have subtly different deployment requirements. For example, you may need to deploy a particular firmware revision to support vSAN, or there may be a particular vendor utility or feature that needs to be scripted to set the hardware up for vSAN.

All in all, a good session for getting clients started down the automation path, both for deployment and "day 2" operations.

HPE Briefing for vExperts and Bloggers

I was invited to attend this briefing by HPE. It was an informal event, with a little under 20 of us vExperts and bloggers in attendance and about a half-dozen HPE folks. Much of the conversations revolved around the intended future for Nimble and Simplivity since HPE's acquisitions. Some of the attendee's were bringing forward their customer's concerns around impact to the quality service and support they'd gotten used to.

Nimble and Simplivity are continuing to be offered in a sort of status quo state for the moment, though changes are starting to happen. For example, InfoSight, the customer-sourced predictive analytics dashboard that many Nimble Storage customers have come to love and rely on, will not only remain available but start to incorporate other, like HPE products such as 3PAR and Simplivity. HPE reps indicated that, basically if they had their druthers, that they'd love to incorporate as much as they could into the InfoSight interface, however time and resources are finite. It was noted that "it's early days", so it'll be interesting to see what HPE decides to integrate beyond 3PAR and Simplivity, or if InfoSight is incorporated into other HPE platforms & applications like Synergy or OneView.

On the other hand, HPE has now committed to delivering future Simplivity solutions exclusively on HPE hardware. This makes sense for HPE, of course, however some long time Simplivity customers are anxious about the platform change.

There was a conversation around the use of VVOLs on HCI & SDS platforms like Nimble and Simplivity. Interestingly HPE considers an upcoming release VVOL "v2" as they are incorporating some major new features such as replication. Some VVOL considerations to watch out for (some of which still affect HPE's offerings as their inherent limitations of VVOLs itself) are: VVOL feature set support (such as replication mentioned earlier), lack of SRM support, lack of RDM support, and maximum number of VVOLs per array (1000 was a number that was tossed out). Object storage still has plenty of benefits, but like all technology has cons to be considered along with the pros.

Overall it was a good briefing and I found the insight how HPE was handling it's new (H)CI platforms illuminating. I think it'll be interesting to see what HPE does with them in the mid to long term. There's opportunity to componentize the tech and use it in more products and offerings. The worst thing HPE could do with Nimble and Simplivity is just sit on them and leave them as-is.

VMworld Customer Appreciation Party

This year's event was held at the T-Mobile Arena. It's a decent venue, very new feeling, having been open not quite a year and a half yet. It was great seeing all the customers enjoying each other's company. Thanks VMware!

One More Sleep

Just one more sleep before the last of VMworld 2017 US. Note that unlike year's past there won't be any coverage here of the Thursday General Session as I'll be otherwise engaged in the morning. Details of that will be in the next, and last, VMworld 2017 roundup.