Speaking with a customer recently I was asked how data, that makes up the contents of a virtual machine is stored within a HPE SimpliVity cluster and how does this data affect overall available capacity within that cluster. This is essentially a two part question, the first part is relatively straight forward however, answering the […]
In this series we have discussed how the HPE SimpliVity platform intelligently places vm’s along with their associated data containers and how these data containers are automatically provisioned and distributed across a cluster of nodes.
Part 1 – covers the mechanism of both primary and secondary data containers and their creation process. You can read that post here
Part 2 – covers the automatic management of these data containers via IWO and the Resource Balancer service. You can read that post here
Part 3 – covers the automatic balancing of data containers. You can read that post here
Intelligent Workload Optimizer and Auto Balancer are not magic wands; they work together to balance existing resources, but cannot create resources in an overloaded DC and in some scenarios manual balancing may be required.
This post explores some of these scenarios, how to analyze resource utilization and manually balance resources within a cluster if required.
Analyzing resource utilization – capacity alarms
Total physical cluster capacity is calculated and displayed as the total aggregate available physical capacity from all nodes within a cluster. As previously stated, we do not report on individual node capacity within the vCenter plugin.
While overall cluster aggregate capacity may be low, it is possible that an individual node(s) may become space constrained. In this case an alarm in vCenter server will be triggered for the node in question.
If the Percentage of occupied capacity is more than 80 and less than 90 percent then the Capacity Alarm service will alarm “WARN” event
If the percentage of occupied capacity is more than 90 Percent then the capacity alarm service will alarm “ERR” event.
The below screen shot illustrates a node generating an 80% space utilization warning.
How can overall clusters aggregate capacity remain low within a cluster while an individual node(s) may become space constrained ? lets look a the table below.
In the simplistic example (however the real world analogy is similar) lets assume physical node capacity of 2TB for each node, on disk vm space after de-duplication and compression is listed. Therefore we know the following.
Total cluster capacity is 8TB
Currently, cluster space consumption 3.4TB
IWO / balancing service has distributed load evenly.
Total cluster consumption is 42.5% (I’ll spare you the math!)
If any one node exceeds 1.6TB utilization in this scenario an 80% capacity alarm will be triggered for that node..
If we were to add more virtual machines to this scenario or grow existing virtual machine sizes we can trigger 80% and 90% utilization alarms for two nodes within this cluster (as consumed on-disk space will scale lineally for both nodes that contain the associated primary and secondary data containers) however in the real world things can be slightly more nuanced for a number of reasons.
Odd number clusters. Odd number clusters may not scale lineally in every scenario. One node in the cluster will be forced to house more primary and secondary copies of virtual machine data. This may trigger a capacity alarm.
VM’s will have backups of various sizes associated with them (stored on the same nodes as the primary and secondary data contains for de-duplication efficiency) along with backups of vm’s from other clusters (remote backups), as remote and local backups grow / shrink in size and eventually timeout different nodes can experience different utilization levels.
VM’s with highly unique data which does no de-duplicate very well.
Analyzing resource utilization – node & vm utilization
Use the dsv-balance-show with the –-showHiveName –consumption –showNodeIp arguments to view individual node capacity and virtual machine consumption. This command can be issued from any node within the cluster.
Note: You need to elevate to the root user to use this command.
To obtain the full list of command options issue the –help switch
The above output from dsv-balance-show tell us the following
Used and remaining space per node
On-disk consumed space of each VM
Consumed I/O of each node and VM (Note this is only a point in time snapshot of the node/vm’s when the command was issued)
Analyzing resource utilization – viewing vm data container placement
The dsv-balance-manual command is a powerful tool and be used to accomplish several tasks.
Gather current guest virtual machine data including data container node residence, ownership, and IOPS statistics.
Gather backup information for each VM (number of backups).
Provide the ability to manually migrate data containers and(or) backups from one node to another.
Provide the ability to evacuate all data from a node to other nodes within the cluster automatically.
Re-align backups that may have become scattered.
dsv-balance-manual displays the output to the screen you will have to scroll backup on your ssh session !) and creates an analogous .csv ‘distribution’ file named: /tmp/replica_distribution_file.csv.
If invoked with the –csvfile argument, the script executes the commands to effect any changes made to the .csv file (more on that later).
Lets explore each functionality.
Listing data container node locations, ownership, on-disk consumed size and current IOPS statistics.
Issuing the command dsv-balance-manual without any arguments is shown below, after the command is issued you will be prompted to select the required Datacenter/Cluster you wish to query. The following information is outputted directly to the console.
VM ID (Used for CSV file tracking)
Owner (which node is currently running the vm)
Node columns (weather the node listed contains a primary or seconday data constainer for the vm)
IO-R (vm read IOPS)
IO-W (vm write IOPS)
SZ(G) (on disk consumed sixe of the vm)
Name (vm name)
IOPS R (aggregate node read IOPS)
IOPS W (aggregate node write IOPS)
Size G (total consumed node on-disk space)
AVAIL G (total remaining node on-disk space)
Note: Consumed on-disk space is a combination of virtual machine, local and remote backup consumption. However only on-disk vm-size is broken out in the SZ(G) column
The following blog post is an excellent way to gather the same information using powerCli
Listing additional backup information for each vm (number of backups)
Issuing the command dsv-balance-manual -q will add additional backup information. Note only local backups are listed. Remote backups (backups of vm’s stored from other clusters) are not listed.
Backup count per vm is listed, if the native column is 100% this indicates that all backups are stored along with the parent vm (ensuring maximum de-duplication efficiency).
Backups can become scattered if the node running the parent vm is offline for an extended period of time (the HPE SimpliVity DVP will choose an alternative node to house the backups) however the auto balancer service should automatically re-align the backups.
Note: dsv-balance-manual does not list local or remote backup sizes)
Manually migrating data containers and its associated backups to another node
Issue the command dsv-balance-manual
Select required Datacenter/Cluster
vi the generated csv file ‘ dsv-balance-manual –csvfile /tmp/balance/replica_distribution_file_Dallas.csv’
Press ‘i’ for insert mode
using the arrow keys, move to the desired column of the virtual machine’s data container you wish to migrate. Note as of 3.7.6 you can either move the primary or secondary data container, for earlier versions you can only move the secondary data container.
Delete the ‘p’ or ‘s’ from that column and using the arrow keys move to the column of the node you wish to migrate the data container to, insert an ‘s’ or ‘p’ character in the appropriate column.
Once complete save you changes to the csv file and exit buy pressing ‘ESC’ and ‘:wq!’
To migrate the data container enter the command ‘dsv-balance-manual –csvfile /tmp/balance/replica_distribution_file_Dallas.csv
The commands re-reads the csv file looking for any changes in the file, if found it will automatically migrate the appropriate container and it associated backups to the desired node.
I have created a short video below outlining this process. In this video I move the primary data container (and associated backups) of the first VM in the generated csv file from node 3 to node 2. If you have any questions in regards to this process I recommend contacting HPE technical support who can walk you through these steps.
In the next post we will explore how to list and calculate local and remote backup sizes.
Continuing on with this series about how virtual machine data is stored and managed within a HPE SimpliVity Cluster, here is a quick recap:
The concept of both primary and secondary Data Containers and their creation process. You can read that post here
The automatic management of these Data Containers via IWO and the Resource Balancer service. You can read that post here
With regards the automatic management of Data Containers, part two mainly (but not exclusively) focused on the initial placement of these Data Containers. Lets call this “day one” provisioning operations of virtual machines within a HPE SimpliVity cluster.
So how does the HPE SimpliVity platform manage virtual machines and their associated Data Containers after day one operations and as virtual machines grow (or shrink) in size?
For a large part, this is handled by the Auto Balancer service. The Auto Balancer service is a separate service to IWO and the Resource Balancer service, however its ultimate goal is the same; to keep resources as balanced as possible within the cluster.
At the risk of repeating myself, think of IWO and the Resource Balance responsible for the provisioning of workloads (VM and associated Data Container placement) and think of Auto Balancer responsible for management of these Data Containers as they evolve in regards to overall node consumption. IWO will be aware of any changes Auto Balancer may implement and will update DRS affinity rules accordingly.
How does Auto Balancer work ?
In my previous post I talked about how Resource Balancer will migrate Data Containers (for VDI workloads) to balance load across nodes. In its current iteration Auto Balancer takes this one step further and will migrate secondary Data Containers for all other VM types and associated backups to less utilized nodes should a node be running low on physical capacity. Auto Balancer does not operate until a node is at 50% utilization (in terms of capacity). As with IWO and Resource Balancer, Auto Balancer is also designed to be zero touch, i.e. the process is handled automatically.
Low physical capacity on a node can be a result of growth of one or more VM’s, backups or the provisioning of more virtual machines into a HPE SimpliVity cluster.
In the above illustration VM-3’s on-disk data has grown by an amount that has in-turn caused Node 3 to become space constrained. Auto Balancer will take a pro-active decision to re-balance data across the cluster in-order to try and achieve optimum distribution of data in terms of space and IOPS. In this simplified example Auto Balancer has elected to migrate the secondary copy of VM-2 to Node 1 to keep the overall cluster balanced. Again this process is invisible to the user.
Lets take a second example to re-enforce what we have learned over the two previous posts, in regards to DRS, IWO and Auto Balancing of resources.
The above illustration does not scale well when representing multiple nodes and virtual machines. It is easier to represent virtual machines and nodes in table format (below); this format will also prove useful in upcoming posts where we’ll learn how to view data distribution across nodes for individual VM’s and backups and how to manually balance this data if required.
For the sake of simplicity, the total physical capacity available to each node in the above table is 2TB. The physical OS space consumed after de-duplication and compression for each VM is listed. For this example we will omit backup consumption. Therefore we know the following.
Total cluster capacity is 8TB
Total consumed cluster capacity is 2.1TB ((50 + 300 + 400 + 300) x 2, allowing for HA)
Node 3 is currently the most utilized node, consuming 1TB of space
Currently the cluster is relatively balanced with no nodes space constrained
DRS is chosen not to run any workload Node 4
Now let’s imagine it’s consumed space grows by 200GB and CPU and Memory consumption also increases.
We now know the following
Total cluster capacity is 8TB
Total consumed cluster capacity is 2.5TB ((50 + 300 + 400 + 300) x 2, allowing for HA)
Node 3 is currently over utilized, consuming 1.2TB of space.
An automatic redistribution of resources could move data such that it matches the table below.
DRS has chosen to run VM-4 on Node 4 due constrained CPU and Memory on Node 3, thus promoting VM-4’s standby Data Container to the primary Data Container.
The Auto Balancer service has migrated the secondary copy of VM-3 to node 1 to re-balance cluster resources.
It is worth noting that other scenarios are equally as valid. i.e. VM-4 secondary data container could also have been migrated to node 1 for example (after the DRS move), which would of resulted in roughly the same re-distribution of data.
In my previous post we talked about capacity alarms being generated at 80% space consumption on an individual. In the above scenario, no alarms would of been generated on any node within the cluster, Auto Balancer redistribute workload according to its algorithms to try an avoid that exact alarm.
Monitoring Placement Decisions
The Auto Balancer service runs on each node. One node in a cluster is chosen as a leader
The leader may change over time or as nodes are added/removed from the cluster.
The Auto Balancer leader will submit tasks to other nodes to perform Data Container or Backup migrations.
The command “dsv-balance-show –shownodeIP“ command shows the current leader
The log file balancerstatus.txt shows submitted tasks and status on the leader node. i.e this txt file is only present from the leader node. Issue the command “cat /var/svtfs/0/log/balancerstatus.txt” to view the status of any active migrations
The output of this files shows the migration of backup from Node 1 to Node 2.
Historic migrations can be view by issuing the following commands
“dsv-active-tasks-show | grep migrate” should show active backup or hive migrations
“dsv-tasks-show | grepmigrate” shows active and completed migration tasks
Currently the Auto Balancer does not support the migration of remote backups
Intelligent Workload Optimizer and Auto Balancer is not a magic wand; it balances existing resources, but cannot create resources in an overloaded DC. In some scenarios, manual balancing may be required. The next post will explore some of these scenarios, how to analyze resource utilization and manually balance resources with a cluster if required.
In my previous blog post my aim was to paint a picture of how data, that makes up the contents of a virtual machine is stored within a HPE SimpliVity Cluster. I introduced the concept of both primary and secondary data containers, their creation process, placement, and how we report on physical cluster capacity.
If you are unfamiliar with any of thees concepts, I suggest reading that post here before continuing 🙂
Now that we have a greater understanding of how virtual machine data is stored we can start to dig a little deeper to understand how the DVP automatically manages this data, and as administrators, how we can manage this data as the environment grows.
There is a lot to cover in this topic and in the interest of trying keeping this post concise and building on core concepts incrementally I will concentrate automatic management of data via IWO. I will focus on other automatic data management features (via Auto Balancer) and manual management of data in future posts.
This post has been co-authored with my colleague Scott Harrison, a big thank you to Scott as he has provided and posed some interesting points to consider.
Automatic Management of Data
IWO, a closer look
As previously stated, a core feature of the HPE SimpliVity DVP is IWO (Intelligent Workload Optimizer).
IWO is comprised of two sub components, The Resource Balancing Service and the VMware DRS / SCVMM PRO Integration Service. For the this post I will focus on the VMware DRS integration service, however the architecture remains analogous for Hyper-V.
IWO’s aim is two fold…
1. Ensure all resources of a cluster are properly utilized.
This process is handled by the Resource Balancing Service both at initial virtual machine creation and proactively throughout the life cycle of the VM and its associated backups. This is all achieved without end-user intervention.
2. Enforce data locality.
This process is handled by the VMware DRS / SCVMM PRO integration service by pinning (through VMware DRS affinity rules or Hyper-V SCVMM PRO) a virtual machine to nodes that contain a copy of that data. Again this is all achieved without end-user intervention.
Note ! The resource balancing service is an always on service and does not rely on VMware DRS to be active and enabled on a VMware cluster, it works independently from the DRS integration service.
Ensuring all resources of a cluster are properly utilized
VM & Data Placement Scenarios
The primary goal of IWO via the resource balancer service is to ensure that no single node within a cluster experiences undue utilization of any one particular resource: CPU, memory, storage capacity, and I/Os. The objective is not that each node experiences the same utilization across all resources or dimensions (that may be impossible) but instead, ensures that each node has sufficient headroom in terms of physical storage capacity to handle expected future demand and that no single node is handling a much larger number of I/Os relative to its cluster peers.
The resource balancing service will use different optimization criteria for different scenarios. For example, initial VM placement on a new cluster (i.e. migration of existing VMs from a legacy system), best placement of a newly created VM within an existing system, Rapid Clone of an existing VM and VDI-specific optimizations for handling linked clones.
Lets explore the different scenarios…
Scenario #1 New VM Creation – VMware DRS enabled and set to Fully Automated
When creating or storage v-motioning a virtual machine to a HPE SimpliVity Host / Datastore vSphere DRS will automatically place the VM on a node with relatively low CPU & memory utilization according to its own algorithms (default DRS algorithms). No manual selection of a Node is necessary.
In the below diagram VMware DRS has chosen Nodes 3 and 4 respectively to house VM’s 1 and 2. Independently, the DVP has chosen a pair of ”least” utilized nodes within the cluster (according to Storage Capacity and I/O Demand) for the data containers of those VM’s to be placed.
IWO via the DRS integration service will pin VM-1 to Node 1 & 2 and VM-2 is to Nodes 3 & 4 by automatically creating and populating DRS rules into vCenter. Lets look at how that is achieved.
How are DRS Rules Created ?
Each DRS rule consists of the following components:
A Virtual Machine DRS Group.
A Host DRS Group.
A Virtual Machines to Hosts rule. This consists of “must run on these hosts”, “should run on these hosts”, “must not run on these hosts”, or “should not run on these hosts”. For HPE SimpliVity we use the “should run on these hosts” rule.
In our example it is not optimal for VM-1 to be running on node 4 as all of the I/O for the VM must be forwarded to either Node 1 or Node 2 in order to be processed. If the VM can be moved automatically to those nodes, then one hop is eliminated from the I/O path.
First a Virtual Machine DRS Group is created, as we’re looking to make a group of virtual machines that will run optimally on two nodes. In our case the name of the Virtual Machine DRS Group will be SVTV_<hostID2>_<hostID3> as we’re looking to make a group of virtual machines that will run on these two nodes.
Below we can see VM-1 assigned to this VM Group. VM-1 will share this group with other Virtual Machines that have their data located on the same nodes. Note the Host ID is a GUID and not an IP address or Hostname etc of the node, while this may appear confusing to the end user this is the actual GUID of the HPE SimpliVity node. Unfortunately mapping the GUID to the hostname or IP address of the node is not possible through the GUI and will require the command “dsv-balance-show –showNodeIP” if you do wish to identify the Node IP.
Looking at the VM Group for VM-1 we can deduce that the data is stored on nodes ending in “aaf and 329” which in-turn equates to OVC .185 and .186 which in turn live on esxi Nodes ending in 81 and 82 as shown below.
Again all of this is handled automatically for you, however for the post to make sense it is important to know where these values come from.
Secondly, Host DRS Groups are created, they are named SVTH_<hostID2>_<hostID3>. The Host groups will only ever contain two nodes as the virtual machines only contain data on two nodes. There will be several host groups created depending on how many Hosts there are in the cluster, one host group for each combination of nodes. Here I have highlighted the host group for hosts 81 and 82 which the VM-1 will be tied to.
Last a “Virtual Machines to Hosts” rule is made that directs DRS that the VM-1 “should run on” the hosts 81 and 82 , i.e. a HPE SimpliVity rule consist of a HPE SimpliVity host group and a SimpliVity VM group.
DRS affinity rules are should rules and not must rules. This is an important distinction that we will discuss later in the post.
If set to Fully Automated VMware DRS will vMotion any VM’s violating these rules to one of the data container holding nodes, thus aligning the VM with its data. In our case VM-1 was violating the affinity rules by virtue of being placed on Node 4 and is automatically v-motioned to node 2 via DRS.
Scenario #2 New VM Creation – VMware DRS Disabled
As stated previously, if VMware DRS is not enabled the Resource Balancer Service continues to function and initial placement decisions continue to operate, however DRS affinity rules will not be populated into to vCenter.
In this scenario when Virtual Machines are provisioned they may reside on a node where there is no data locality. A HPE SimpliVity alarm “VM Data Access Not Optimized” will be generated at the VM object layer within vCenter alerting the user.
The HPE SimpliVity platform through interacting with vCenter Tasks and Events will generate an event and remediation steps directing you to which nodes contain a copy of the Virtual Machine data. In the below diagram I have highlighted the “Data access is not optimized” event that directs the user to v-motion the VM to the outlined hosts.
Rapid clone of an existing VM
We have shown how the resource balancing service behaves in regard to new VM creation, however the resource balancing service takes a different approach for HPE SimpliVity Clones and VMware Clones of Virtual Machines (VMware clones can also be handled by HPE SimpliVity via the VAAI plugin for faster operation).
In this scenario the Resource Balancer service will leave cloned VM data on the same nodes as the parent as this achieves best possible cluster-wide storage utilization & de-duplication ratios.
If I/O demand exceeds node capabilities the DVP will live-migrate data containers in the background to less-loaded node(s).
Live migration of a data container does not refer to VMware storage v-motion. It refers to the active migration of a VM Data Container to another Node.
VDI specific optimizations for handling linked clones
The scope of VDI and VDI best practices is beyond this post however I did want to mention how the HPE SimpliVity platform handles this scenario. VDI environments typically clone a VM, or golden image, many times. These clones essentially become read only templates for new desktops.
As VDI desktops are deployed, linked clones are created on random hosts. Linked clones mostly read from the read only templates and write locally which causes proxying and adds extra load to the nodes that host the read only templates.
To mitigate against this the Resource Balancer service will automatically distribute read-only master images across all nodes for even load. This aligns linked clones with their parents to ensure node-local access. It is also worth noting that Resource Balancer may also relocate links clones.
Thoughts on initial placement
Now that we have an understanding of initial placement lets explore some of the finer points and interactions between DRS and IWO.
That’s all well and good I hear you say, but what if DRS has chosen a viable node based on available CPU and Memory (or whatever other DRS rules we have set it) and the DVP has placed the primary and secondary data containers on two other nodes which may be more CPU and Memory constrained ? or what if you as the administrator want more granular control ?
Firstly, storage is always the primary concern right ? DRS does not have access to underlying storage utilization figures nor is even designed with this in mind, its only concerns are CPU and Memory resources of the cluster as a whole, HPE SimpliVity through the DVP must take storage into account when placing data.
If this scenario is indeed encountered then DRS will be forced to re-calculate cluster resources and may move other VM’s (according to their affinity rules) to re-balance load after the new affinity rule(s) are populated and enacted upon for any new virtual machines added to the cluster. Again this process is dynamic both from a HPE SimpliVity and VMware DRS point of view.
It is also recommended to Separate workloads where possible, for example, server-based and VDI workloads into separate clusters.
The HPE SimpliVity DRS rules are “Should” rules and thus during a HA event, this rule will be overwritten in order to keep the VM’s running. DRS makes a best effort to optimize according to existing IWO rules, but in some high load environments DRS will ignore IWO. This can result in VM’s being run on a node other than their primary or secondary storage nodes.
Secondly, what if you as the administrator want more granular control ? In this scenario IWO and the Resource Balancer service can be disabled however, this is only recommended with the advice of support for specific circumstances and is beyond the scope of this post. Suffice to say Resource Balancer in conjunction with IWO designed to be zero touch.
Disabling the Resource Balancer
Why? It is not recommend to disable Resource Balancer as its algorithms are designed for, and cater to all scenarios, however for specific use cases and architectures it may be beneficial to temporarily disable Resource Balancer, this will of course require a more hands on approach to overall data management.
Resource Balancer is enabled by default and uses the BALANCED placement algorithm for initial VM placements (existing VM’s transferred into the system) and the BEST_FIT for provisioning new VM’s. As outlined in this post BEST_FIT and BALANCED take into account Storage Capacity and I/O Demand of all nodes in the cluster when deciding where to place the primary and secondary data containers.
Issue the command “dsv-balance-show –status” to view the status of Resource Balancer
Resource Balancer can be disabled on a per node basis if required. When Resource Balancer is disabled the VM provisioning algorithm will now be set to RANDOM and LOCAL_PRIMARY. This essentially means one of two things.
If the user or DRS selects a particular node to house a VM from vCenter, this will happen. i.e. Resource Balance will not choose a better Node (as its offline).
If virtual machines are being created at the cluster level within vCenter this node may randomly house a data container on a round robin provisioning .
Resource Balancer can be disabled on the node using the command “dsv-balance-disable”.
The Resource Balancer service can be re-enabled by the command “dsv-balance-enable”. Once re-enabled the Resource Balancer will service will default back to BEST_FIT and BALANCED within the overall cluster.
Checking the status of IWO
A cluster must contain three HPE OmniStack hosts to start creating cluster groups and affinity rules in DRS. A one or two-host cluster automatically accesses data efficiently and does not need affinity rules.
When you first deploy an HPE OmniStack host, the IWO setting defaults to enabled. If you deploy an HPE OmniStack host to a cluster that contains other HPE OmniStack hosts, IWO defaults to the setting used by the cluster. For example, if you changed the setting from enabled to disabled, the HPE OmniStack host joining the cluster takes on the disabled IWO setting.
You can also include standard ESXi hosts as long as they share an HPE SimpliVity datastore with another HPE OmniStack host in the cluster.
Use the command “svt-iwo-show” (from any node within the cluster) to show if a cluster has Intelligent Workload Optimizer enabled or disabled. This will determine
whether the feature is active or not.
IWO can be disabled, unlike the Resource Balancer service, disabling IWO is a cluster wide operation not a node local operation. Disabling IWO will remove all HPE SimpliVity affinity rules from vCenter (or SCVMM).
To disable IWO issue the command “svt-iwo-disable”
Why? Again, it is not recommend to disable IWO, however for specific use cases and architectures it may be beneficial to temporarily disable IWO, this will of course require a more hands on approach to overall data management. A better alternative may be to set DRS to manual.
IWO can be re-enable using the command “svt-iwo-enable”. Once re-enabled DRS rules are automatically re-populated back into vCenter.
As a final note, do not enable vSphere Distributed Power Management (DPM) to ensure that it cannot shut down the HPE OmniStack Virtual Controller on a host after load balancing occurs
Management of Data as the environment grows
The HPE SimpliVity platform will continue to manage cluster data as the environment grows and evolves. Auto Balancer works in conjunction with the IWO / Resource Balancer service and is focused on the management of VM Data containers within the cluster. Auto Balancer aim is to migrate data containers to other nodes within the cluster should it be required i.e. as data grows for a large VM’s a particular node may become overloaded. Auto Balancer will look for and migrate data containers to other less utilized nodes within the cluster.
We will discuss the Auto Balancer and manual management of Data in an upcoming post.