How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 6 – Calculating backup capacity

So far in this series, we have discussed how the HPE SimpliVity platform intelligently places vm’s and their associated data containers, and how these data containers are automatically provisioned and distributed across a cluster of nodes.

I have talked about data containers with regards to virtual machines, as in, data containers are constructs for holding all data (vm config and flat files) associated with a virtual machine as illustrated below 

data-container-contents


But what about backups ? how are they stored and managed ? Firstly lets look briefly look at the backup mechanism and types….

HPE SimpliVity Backups

HPE SimpliVity backups are independent snapshots of a virtual machine’s data container (at the time of backup) and are independently stored in the HPE SimpliVity object store.

They are not mounted to the NFS datastore until needed. These backups are stored in HA pairs and are kept on the same nodes as the virtual machine’s primary and secondary data containers to maximize de-duplication of data.

backups

This post will not explain how they are logically independent as explaining the concept would require a separate post in itself and will lead to other questions in regards to capacity management I often get asked. In my next post I will look to explain how data containers are logically independent from each other which will lead us naturally into space reclamation scenario’s

HPE SimpliVity Backup Mechanism

The snapshot process is handled natively by the HPE SimpliVity platform  

  1. A point in time Data Container (Snapshot) is created locally.
  2. This is done by creating a new reference to the data container and incrementing reference counts in the Object Store of the source data container.
  3. No data writes are incurred, just metadata and index update operations.
  4. The extra reference allows for full de-duplication  

Backup Types

The HPE SimpliVity platform supports three flavors of snapshots and regardless of the backup type the end result is always the same, a HPE SimpliVity snapshot will be taken.

Crash consistent – The standard HPE SimpliVity backup. This is a snapshot of the Data Container at the time the backup is taken. This backup consists of only metadata and index update operations. No I/O to disk incurred making them nearly instantaneous.

Application consistent – This methods leverages VMware Tools to firstly quiesce the VM prior to taking a HPE SimpliVity snapshot (default for Application Consistent option). Select the Application Consistent option when opting for application consistent backups (when creating or editing a policy rule)

  • Firstly, a VMware Snapshot is taken. Outstanding I/O flushed to the disk via VMware tools commands. The VM is now running off a snapshot.
  • A Point in time HPE SimpliVity snapshot is taken of the Data Container (Standard HPE SimpliVity Backup)
vCenter Tasks highlighting the VMware Snapshot Process (Note: this process completes completes before HPE SimpliVity Snapshot is taken)

Note: The restored VM will also contain the VMware Snapshot file (as we take the HPE SimpliVity Snapshot after the VMware Snapshot is taken) thus, the VM should be reverted prior to powering on to ensure the VM is fully consistent (as any I/O performed after the VMware snapshot is was not flushed)

Remember to revert any VMware Snapshots when restoring a HPE SimpliVity Application Consistent Backup

Application consistent with Microsoft VSS – Standard HPE SimpliVity Backup with Microsoft Volume Shadow Copy Service (VSS)

MS-VSS uses application-specific writers to freeze an application and databases, and ensures that all the data transferring from one file to another is captured and consistent before the HPE SimpliVity backup is taken.

SQL Server 2008 R2, 2012, 2014 and 2016 are supported VSS applications. and Windows Server 2008 R2, 2012 R2, and 2016 are supported VSS operating systems.

Guidelines and full steps on how to enable (and restore) VSS backups are outlined in the HPE SimpliVity administration guides

Select the Microsoft VSS option when opting for application-consistent backups (when creating or editing a policy rule)

Save windows credentials to the virtual machine(s) that use the policy using the SimpliVity “Save Credentials for VSS” option. The credentials you provide should be also be added to the local machine’s administrator group.

These credentials are used to copy and run VSS scripts to freeze the application and databases before the HPE SimpliVity backup is taken.

Backup Consumption – Viewing Cluster Capacity

To view cluster capacity navigate to the Cluster object and under the Monitor tab, choose HPE SimpliVity Capacity.

Within the GUI, physical backup consumption is not split from overall physical consumption i.e. physical consumption is a combination of local, remote and virtual machine backup data.

Logical consumption is listed, you can think as logical consumption as the physical space required to store all backups with no de-duplication.

Within the GUI it is possible to view individual unique backup sizes by selecting any HPE SimpliVity backup and under backup Actions, selecting the Calculate Unique Backup Size option.

Once complete scroll to the right of the backup screen to the column – Unique Backup to view.

This method can be useful, however individual node consumption figures are not listed within the vSphere plugin / GUI. If you wish to view physical consumption information per node you will need to resort to the cli

Backup Consumption – viewing individual node utilization

The command ‘dsv-backup-size-estimate‘ can be a useful tool if you are looking to break out individual node backup consumption figures. This command will list total node on-disk consumption, local backup and remote backup on-disk consumption figures for all backups contained on the node.

The command has many options, using the –node switch will list all backups contained on the node. Using the –vm along with –datastore will allow you to specify an individual vm if required. The command can be run from any node to interrogate itself or any other node.

listed below the output when running the command using the –node switch (I shortened it to show just two virtual machines)

The bottom of the output will display node usage statistics. For OmniStackVC-10-6-7-225 in this example, total Node on-disk consumed data is 625.05 GB.

This is comprised of Data (on disk VM consumption) of 142.17 GB + Local backup consumption of 268.15 GB + Remote backup consumption of 214.73 GB

root@omnicube-ip7-203:/home/administrator@vsphere# dsv-backup-size-estimate --node OmniStackVC-10-6-7-225

.---------------------------------------------------------------------------------------------------------------------.
| Name                                          | Bytes    | Local   | Offnode | Heuristic | Backup                   |
|                                               | Written  | Size    | Size    |           | Time                     |
+-----------------------------------------------+----------+---------+---------+-----------+--------------------------+
| Windows      (TEST-DS)                        |          |         |         |           |                          |
+-----------------------------------------------+----------+---------+---------+-----------+--------------------------+
| Windows-backup-2019-06-19T11:11:46+01:00      |  12.06GB | 12.06GB |         |    2.24GB | Wed Jun 19 10:14:41 2019 |
| Windows-backup-2019-06-19T11:12:10+01:00      | 360.37KB | 12.06GB |         |   66.85KB | Wed Jun 19 10:15:06 2019 |
| Test                                          |   2.79GB | 12.06GB |         |  529.13MB | Mon Jun 24 05:07:14 2019 |
+-----------------------------------------------+----------+---------+---------+-----------+--------------------------+
| VM Total Backups: 3 of 3                      |  14.84GB | 36.18GB |      0B |    2.75GB |                          |
| = local 3 + offnode 0, unknown 0              |          |         |         |           |                          |
'-----------------------------------------------+----------+---------+---------+-----------+--------------------------'


.----------------------------------------------------------------------------------------------------------------------.
| Name                                          | Bytes    | Local    | Offnode | Heuristic | Backup                   |
|                                               | Written  | Size     | Size    |           | Time                     |
+-----------------------------------------------+----------+----------+---------+-----------+--------------------------+
| ELK_Ubutnu (DR_1TB)                           |          |          |         |           |                          |
+-----------------------------------------------+----------+----------+---------+-----------+--------------------------+
| 2019-06-20T00:00:00+01:00                     | 133.88GB | 133.88GB |         |   24.33GB | Wed Jun 19 23:00:00 2019 |
| 2019-06-21T00:00:00+01:00                     |  98.91GB | 134.27GB |         |   17.97GB | Thu Jun 20 23:00:00 2019 |
+-----------------------------------------------+----------+----------+---------+-----------+--------------------------+
| VM Total Backups: 2 of 2                      | 232.79GB | 268.15GB |      0B |   42.30GB |                          |
| = local 2 + offnode 0, unknown 0              |          |          |         |           |                          |
'-----------------------------------------------+----------+----------+---------+-----------+--------------------------'

For node OmniStackVC-10-6-7-225 (32f62c42-6b8f-8529-c6a5-9db972eee181) in cluster 'DR Cluster':
- Total Node Data:  625.05GB = Data: 142.17GB  + Local backup: 268.15GB + Remote backup: 214.73GB
- Compressed:        113.58GB, Uncompressed: 207.95GB
- Comp  ratio:       0.54617
- Dedup ratio:       0.33269
- Total number of backups: 15
- Number of backups shown: 15 = Local 15 + Offnode 0, unknown 5
- Total backup bytes:      482.88GB = local 482.88GB + offnode 0B
- Gross heuristic estimate: 59.60GB
root@omnicube-ip7-203:/home/administrator@vsphere#

Backup Consumption – viewing individual backup consumption

Focusing on the virtual machine backup consumption output above, I have chosen two very different types of virtual machines for comparison purposes.

Virtual machine ‘Windows‘ in-guest data de-duplicates very well while virtual machine ‘ELK_Ubutnu‘ contains a database workload that generates a high amount unique data, thus each individual backup will contain a larger amount of unique data. Lets compare the figures.

For each individual virtual machine 4 main columns will be listed. However for on-disk consumption purposes we only really care about the Heuristic values.

  • Bytes Written – While I have marked this as ‘Logical Bytes Written’ this is a combination of unique on-disk virtual machine size and unique backup size at time of backup. For the very first backup this will capture the unique size of the VM, however no I/O is actually incurred (hence I marked it as logical!). This can be a useful value for calculating the unique on-disk consumption of a virtual machine.
  • Local Size – Logical backup size (as exposed in the GUI)
  • Off-node Size – Weather the backup is local or remote.
  • Heuristic – An approximation of unique on-disk data associated with this backup and the VM since the time of backup. This value is cumulative from when the backup was first taken.

Virtual Machine – ‘Windows

Looking at the values for this virtual machine’s backups we can deduce the following.

  1. The approximate unique size of this VM was originally 12 GB. If we were to delete all backups associated with this virtual machine, approx 12 GB of data would still be consumed on this node.
  2. Since the first backup was taken, approx 2.75GB of unique data is associated with the virtual machine and its backups.
  3. Deleting all backups would reclaim no more than 2.75 GB of data, however as that data is shared with the virtual machine since time of backup, less could be reclaimed in reality. To obtain a truer value, locate the backup within the GUI and calculate its unique size as outlined in the further back in the post.

Virtual Machine – ELK_Ubuntu

Virtual Machine ELK_Ubuntu contains much more unique data. This is reflected via its individual and aggregate backup figures of 24GB, 17GB (Totaling 42GB)

  1. The approximate unique size of this VM was originally 133 GB. If we were to delete all backups associated with this virtual machine, approx 133 GB of data would still be consumed on this node.
  2. Since the first backup was taken, approx 42 GB of unique data is associated with the virtual machine and its backups.
  3. Deleting all backups would reclaim no more than 42 GB of data

The command dsv-backup-size-estimate can be a useful tool, however, bear in mind that it is a point in time calculation when run and should be treated as such. As data is written to and deleted from the Object Store, de-duplication rates may grown or shrink in size. As could data associated with the original.

I find these values are useful for gaining an approximation of virtual machine and associated backup consumption and from there working within the GUI to identify backups that may be starting to grow too large !

How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 3 – Provisioning Options

In my previous blog posts my aim was to paint a picture of how data, that makes up the contents of a virtual machine is stored within a HPE SimpliVity Cluster. 

I introduced the concept of both primary and secondary data containers, their creation process, placement, and how we report on physical cluster capacity. Part 2 explained how the HPE SimpliVity Intelligent Workload Optimizer (IWO) works and I detailed how it automatically manages VM data in different scenarios.

This post will cover the interactions between IWO and VMware DRS and how admins can manage this interaction in regards to data placement if required.


Understanding the finer points of initial placement

Now that I have made the mechanics of initial placement of data clear from the previous posts, let’s explore some of the finer points and interactions between DRS and IWO.

‘Automatic management of data’ and ‘zero touch infrastructure,’ is all well and good, I hear you say, but what if you, as the administrator, want more granular control over your infrastructure?

Firstly, VMware DRS does not have access or awareness of underlying storage utilization, nor is it even designed with this in mind. Its only concern is keeping the CPU and Memory resources of the cluster balanced as a whole. However, the HPE SimpliVity platform must take storage and available IOPS into account when placing data and, perhaps even more importantly (as we covered in Part 2), it looks to place similar workloads together to maximize de-duplication rates.

For those reasons, the HPE SimpliVity platform may place the primary and secondary data containers for a VM on nodes other than the nodes VMware DRS chooses. What options are available to control this behavior? Let’s try to tease this apart.


We know the following to be true: If set to fully automatic, VMware DRS will make a placement decision based on available CPU and memory resources of the cluster. The HPE SimpliVity platform (through the Resource Balancer Service) may or may not honor these placement decisions based on its own placement algorithms of available capacity, available IOPS and VM type – specifically HPE SimpliVity clones, VMware clones (if the VAAI plugin is installed) and VDI. From there IWO creates affinity rules for these VMs and pushes these rules to vCenter. As a result, VMware DRS will be forced to ‘obey’ these new rules by performing v-motions to the new correct host.

These new rules could potentially cause a resource imbalance in the cluster in terms of CPU and memory utilizations, as DRS has chosen an optimum node which could be ignored, and vCenter is aware of that possibility.

VMware DRS evaluates your cluster every 5 minutes. If there’s an imbalance in load, it will reorganize your cluster. If this scenario is indeed encountered, DRS will re-calculate overall cluster resources and may move other VMs (according to their affinity rules) to re-balance cluster load. This process is dynamic both from a HPE SimpliVity and VMware DRS point of view, so while a particular VM created (or migrated) to an HPE SimpliVity cluster may be placed elsewhere, DRS can re-balance the cluster by moving other VMs to redistribute resources. This is often what we see in such scenarios.   

The HPE SimpliVity DRS rules are ‘should’ rules and thus during an HA event such as I just described, this rule will be ignored in order to keep the VMs running. ‘Should’ rules will also ignore DRS affinity rules in scenarios where a single host is extremely over-utilized. For its part, DRS makes a best effort to optimize according to existing affinity rules created by IWO, but in some high load environments, DRS will ignore affinity rules populated via IWO. This can result in VMs running on a node other than their primary or secondary storage nodes.

We will discuss the merits of data locality, but first let’s explore the placement algorithms of the Resource Balancer Service.

Resource Balancer placement algorithm

By default, the Resource Balancer Service uses the BALANCED placement algorithm for initial VM placements (existing VMs transferred into the system) and the BEST_FIT for provisioning new VMs. BEST_FIT and BALANCED consider Storage Capacity and I/O demand of all nodes in the cluster when deciding where to place the primary and secondary data containers. CPU and memory are currently not considered. (I outlined the Resource Balancer placement algorithms for each VM type in my previous post.)

For each replica (secondary data container of the virtual machine) the following algorithm is applied:

  • The node with lightest IOPS workload is chosen for placement, provided it meets the storage utilization constraint
  • If all nodes violate the constraint, it will choose the node that violates the constraint by the smallest amount

For batch VM deployment:

  • If several nodes have the same score, the placement node is chosen at random
  • Data container replicas follow the same algorithm
  • The node on which VM is deployed does not play a special role

Issue the command “dsv-balance-show –status” to view the status of Resource Balancer.

Disabling the Resource Balancer

The Resource Balancer can be disabled on a per node basis if required. When Resource Balancer is disabled, the VM provisioning algorithm will now be set to RANDOM and LOCAL_PRIMARY. This essentially means one of two things:

  • DRS set to manual – If the user selects a node to house a VM from vCenter, the Resource Balancer will not choose a better node to house its data (as it is offline).
  • DRS set to fully automatic – If virtual machines are being created at the cluster level within vCenter, data containers are provisioned on a round robin basis as chosen by DRS. Again, the Resource Balancer will not choose a better node (as it is offline).

Resource Balancer can be disabled on the node using the command “dsv-balance-disable”.

Enabling the Resource Balancer

The Resource Balancer Service can be re-enabled by the command “dsv-balance-enable”. Once re-enabled the Resource Balancer will default back to BEST_FIT and BALANCED for that node.

IWO operations

A cluster must contain three HPE SimpliVity hosts (previously known as HPE OmniStack hosts) to start creating cluster groups and affinity rules in DRS. A one- or two-host cluster automatically accesses data efficiently and does not need affinity rules.

When you first deploy an HPE SimpliVity host, the IWO setting defaults to enabled. If you deploy an HPE SimpliVity host to a cluster that contains other HPE SimpliVity hosts, IWO defaults to the setting used by the cluster. For example, if you changed the setting from enabled to disabled, the HPE SimpliVity host joining the cluster takes on the disabled IWO setting.

You can also include standard ESXi hosts if they share an HPE SimpliVity datastore with another HPE SimpliVity host in the cluster and the HPE SimpliVity VAAI plugin is installed.

Disabling IWO

Unlike the Resource Balancer Service, disabling IWO is a cluster wide operation not a node local operation.

Disabling IWO will remove all HPE SimpliVity affinity rules from vCenter as outlined below.

To disable IWO issue the command “svt-iwo-disable

Below illustrates the DRS affinity rules both before and after the operation. Note: Disabling IWO results in the DRS affinity rules being removed from vCenter. Once IWO is re-enabled the DRS rules will be automatically added back into vCenter.

Enabling IWO

IWO can be re-enable using the command “svt-iwo-enable”.

Once re-enabled DRS rules are automatically re-populated back into vCenter.

DRS rules are automatically re-populated

Checking the status of IWO

Use the command “svt-iwo-show” (from any node within the cluster) to show if a cluster has Intelligent Workload Optimizer enabled or disabled. This will determine whether the feature is active or not.

Data Locality

Data locality plays an important role in your decisions as to the best approach for your particular data center.

As previously discussed, it is possible and supported to run virtual machines on nodes that do not contain primary or secondary copies of the data. In this scenario, I/O travels from the node over the federation network back to where the primary data container is located. This is called I/O proxying.

By default, and as outlined through the creation of DRS affinity rules, the HPE SimpliVity DVP is configured to avoid virtual machine I/O proxying. You can run virtual machines in this configuration by either disabling IWO or setting DRS to manual. Typically, this will add 1 to 2ms of round-trip time above the baseline I/O latency. This may or may not be a concern depending on the performance requirements of the virtual machine workload.

An HPE SimpliVity alert “VM Data Access Not Optimized” will be generated at the VM object layer within vCenter, however this notification can be suppressed.

Sharing HPE SimpliVity datastores with standard ESXi hosts

You can share HPE SimpliVity datastores with standard ESXi hosts (hosts without HPE OmniStack software). Configuration of access nodes is beyond the scope of this post, however you can find information here.

Essentially, you need to specify the storage data transfer IP address that you want the standard ESXi host to use. To do this, you must know the network IP address of the HPE OmniStack Virtual Controller (OVC) virtual machine you plan to use to share the datastore. For example, the HPE SimpliVity host provides two potential paths:

  • The Storage Network IP address (recommended as it provides redundancy and 10GB access)
  • Management Network IP address (also supported but provides no redundancy and only 1GB access)

Your desired network also impacts the IP address. If you use the switched method for the 10 GbE storage network, use the Storage Network IP address of any HPE OVC. It is a best practice to use this network because it provides higher bandwidth and failover capability.

If you use the direct-connect method for the 10 GbE storage network, specify the Management Network IP address of the HPE SimpliVity host. However, this network has no failover capability.

Note: The standard ESXi host can reside in the same cluster as the HPE SimpliVity host and datastore you plan to use, or in another cluster within the same datacentre for greater flexibility.

As illustrated below, Node 5 is a standard ESXi host, it has been configured to communicate with the storage IP of the OVC on Node 4 (as the OVC handles the NFS traffic). This means the OVC on Node 4 has now exported the desired NFS datastore to Node 5 via the 10GB network.

Configured access nodes do not have data locality by their very nature. All traffic will flow via the 10GB or 1 GB back to the Node it is configured to communicate with.

When you allow a standard ESXi host access to an HPE SimpliVity datastore by sharing it with an HPE SimpliVity host, you can:

  • Use vSphere vMotion to migrate virtual machines that run on a standard ESXi host to another host in the federation with no disruption to users.
  • Use vSphere Storage vMotion to migrate virtual machines to a HPE SimpliVity datastore with no disruption to users

Avoiding the ‘double hop’ scenario

Be sure to configure NFS access to the standard ESXi server from the HPE SimpliVity node that contains the local copy of the data you are serving to avoid double hopping over the network which would add further latency.

The below diagram illustrates the scenario where Node 5 is configured to access Node 4, however Node 4 does not contain a copy of VM-2 data which is running on Node 5.  As a result, one further hop is required over the federation network to access a copy of the VM’s data.

Scenarios

Now that I have laid out the available options, let’s look at some scenarios to illustrate these points.

I’ve dealt with many different customer requests and I always start with the same question: What are your priorities? Capacity efficiency, CPU utilization, memory utilization, performance, overhead (over-commit, under-commit), even data distribution, hot standby? If you paint a picture of the desired end goal, it’s easy to work with the HPE SimpliVity platform to achieve this goal. (Client virtualization scenarios and VDI best practices are beyond the scope of this post. Information on this topic can be found at www.hpe.com/simplivity or in this technical whitepaper.)

Scenario 1 – Zero Touch Infrastructure

I want zero touch infrastructure. I have sufficient cluster resources in my environment (CPU, memory, capacity and performance) for all my VMs, however achieving best possible data duplication ratios is a priority to accommodate future growth.  I have a mixed workload type. Storage distribution and control over virtual machine distribution and placement are not a concern, and I want VMware DRS and HPE SimpliVity to work together to manage my cluster resources.

If this is the case, set VMware DRS to fully automatic and let DRS choose the best nodes to provision workload based on available CPU and memory cluster resources. As outlined above, the HPE SimpliVity platform may or may not honor this placement decision based on its own placement algorithms of available capacity, IOPS and VM type. If placement matches, then no change. If the nodes do not match, the affinity rules populated to vCenter will force DRS to re-calculate cluster resources. Based off the configured settings, DRS may move other VMs (according to their affinity rules) to re-balance cluster load. This process works for most customers. You are ensuring maximum data deduplication rates while allowing DRS to balance workload within certain constraints.

Scenario 2 – More Granular Control

I want more granular control over the environment. I have 4 nodes in the cluster. This is a new environment, and I want 25% of my VMs on each node, which will ensure each node is correctly utilized in terms of CPU, memory and capacity. I want to control where each VM is placed during provisioning. I have sufficient capacity, and I’m willing to forgo some data efficiency to achieve my goals. Once initial provisioning is complete, I want future workloads to be automatically provisioned.

This scenario might be a bit extreme, but it does give us a chance to explore a few options. Firstly, leaving DRS set to fully automatic would most likely achieve 25% distribution across all nodes, and in most circumstances the HPE SimpliVity platform would follow suit with this storage distribution. However, you stated that you want granular control over which VMs run where. Again, it’s about painting a picture of your goals.

The following should be implemented:

  • Resource Balancer should be set to disabled on each node.
  • IWO should also be set to disabled.
  • DRS should be set to manual.

These settings will allow direct control over where VMs are provisioned. You can now provision VMs manually on each appropriate node to achieve 25% distribution across the cluster. In other words, whatever node is selected within vCenter to house the VM will also be the node that houses the data (the secondary node will be chosen on a round robin basis). No affinity rules will be populated to vCenter (yet).

Note: In this scenario, you stated that deduplication was not a top priority due to sufficient capacity. VMs have simply been provisioned based on the 25% distribution rule. If the Resource Balancer Service had been enabled, it might have chosen nodes with existing data containers to maximize deduplication rates. It is common for cloned VMs to all reside on the same node in order to maximize deduplication rates. This is not the case in this scenario, because we traded possible deduplication gains for desired VM distribution.

Once complete, Resource Balancer and IWO should be re-enabled to ensure that DRS affinity rules are populated into vCenter server. Once complete, DRS should be set to fully automatic. This will ensure future workload is provisioned automatically.

Scenario 3 – Even VM Load Distribution

I want even VM load across my cluster in terms of CPU and memory. Data locality and I/O performance are not top priorities. Most applications are CPU and memory intensive, and adding 1ms to 2ms to I/O trip times will not impact application performance.

In this scenario, IWO can be disabled thus ensuring no DRS affinity rules are populated into vCenter server. Suppressing DRS affinity rules will allow VMware DRS or allow you to directly distribute VMs across the cluster as desired to ensure all VMs are adequately resourced in terms of CPU and memory. The ‘Data Access Not Optimized’ alarm can be suppressed within vCenter server.


The next post will discuss the Auto Balancer service along with manual management of Data Containers.

How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 5 -Analyzing Cluster Utilization

In this series we have discussed how the HPE SimpliVity platform intelligently places vm’s along with their associated data containers and how these data containers are automatically provisioned and distributed across a cluster of nodes.

  • Part 1 covers how data of a virtual machine is stored and managed within an HPE SimpliVity Cluster 
  • Part 2 explains how the HPE SimpliVity Intelligent Workload Optimizer (IWO) works, and how it automatically manages VM data in different scenarios
  • Part 3 outlines options available for provisioning VMs
  • Part 4 covers how the HPE SimpliVity platform automatically manages virtual machines and their associated Data Containers after initial provisioning, as virtual machines grow (or shrink) in size

Intelligent Workload Optimizer and Auto Balancer are not magic wands; they work together to balance existing resources, but cannot create resources in an overloaded DC and in some scenarios manual balancing may be required.

This post explores some of these scenarios, how to analyze resource utilization and manually balance resources within a cluster if required.


Analyzing resource utilization – capacity alarms

Total physical cluster capacity is calculated and displayed as the total aggregate available physical capacity from all nodes within a cluster. As previously stated, we do not report on individual node capacity within the vCenter plugin.

space
Total cluster capacity

While overall cluster aggregate capacity may be low, it is possible that an individual node(s) may become space constrained. In this case an alarm in vCenter server will be triggered for the node in question.

  • If the Percentage of occupied capacity is more than 80 and less than 90 percent then the Capacity Alarm service will  alarm “WARN” event
  • If the percentage of occupied capacity is more than 90 Percent then the capacity alarm service will alarm “ERR” event.

The below screen shot illustrates a node generating an 80% space utilization warning.

capacity alarm-1
Individual node generating an 80% space utilization alarm.

How can overall clusters aggregate capacity remain low within a cluster while an individual node(s) may become space constrained ? lets look a the table below.

capacity alarm-2

In the simplistic example (however the real world analogy is similar) lets assume physical node capacity of 2TB for each node, on disk vm space after de-duplication and compression is listed. Therefore we know the following.

  • Total cluster capacity is 8TB
  • Currently, cluster space consumption 3.4TB
  • IWO / balancing service has distributed load evenly.
  • Total cluster consumption is 42.5% (I’ll spare you the math!)
  • If any one node exceeds 1.6TB utilization in this scenario an 80% capacity alarm will be triggered for that node..

If we were to add more virtual machines to this scenario or grow existing virtual machine sizes we can trigger 80% and 90% utilization alarms for two nodes within this cluster (as consumed on-disk space will scale lineally for both nodes that contain the associated primary and secondary data containers) however in the real world things can be slightly more nuanced for a number of reasons.

  • Odd number clusters. Odd number clusters may not scale lineally in every scenario. One node in the cluster will be forced to house more primary and secondary copies of virtual machine data. This may trigger a capacity alarm.
  • VM’s will have backups of various sizes associated with them (stored on the same nodes as the primary and secondary data contains for de-duplication efficiency) along with backups of vm’s from other clusters (remote backups), as remote and local backups grow / shrink in size and eventually timeout different nodes can experience different utilization levels.
  • VM’s with highly unique data which does no de-duplicate very well.

Analyzing resource utilization – node & vm utilization

Use the dsv-balance-show with the -showHiveName –consumption –showNodeIp arguments to view individual node capacity and virtual machine consumption. This command can be issued from any node within the cluster.

Note: You need to elevate to the root user to use this command.

  1. sudo su
  2. source /var/tmp/build/bin/appsetup
  3. dsv-balance-show  –showHiveName –consumption –showNodeIp

To obtain the full list of command options issue the –help switch

balance show-1

The above output from dsv-balance-show tell us the following

  • Used and remaining space per node
  • On-disk consumed space of each VM
  • Consumed I/O of each node and VM (Note this is only a point in time snapshot of the node/vm’s when the command was issued)

Analyzing resource utilization – viewing vm data container placement

The dsv-balance-manual command is a powerful tool and be used to accomplish several tasks.

  1. Gather current guest virtual machine data including data container node residence, ownership, and IOPS statistics.
  2. Gather backup information for each VM (number of backups).
  3. Provide the ability to manually migrate data containers and(or) backups from one node to another.
  4. Provide the ability to evacuate all data from a node to other nodes within the cluster automatically.
  5. Re-align backups that may have become scattered.

dsv-balance-manual displays the output to the screen you will have to scroll backup on your ssh session !) and creates an analogous .csv  ‘distribution’ file named: /tmp/replica_distribution_file.csv.

If invoked with the –csvfile argument, the script executes the commands to effect any changes made to the .csv file (more on that later).

Lets explore each functionality.


Listing data container node locations, ownership, on-disk consumed size and current IOPS statistics

Issuing the command dsv-balance-manual without any arguments is shown below, after the command is issued you will be prompted to select the required Datacenter/Cluster you wish to query. The following information is outputted directly to the console.

  • VM ID (Used for CSV file tracking)
  • Owner (which node is currently running the vm)
  • Node columns (weather the node listed contains a primary or seconday data constainer for the vm)
  • IO-R (vm read IOPS)
  • IO-W (vm write IOPS)
  • SZ(G) (on disk consumed sixe of the vm)
  • Name (vm name)
  • IOPS R (aggregate node read IOPS)
  • IOPS W (aggregate node write IOPS)
  • Size G (total consumed node on-disk space)
  • AVAIL G (total remaining node on-disk space)

Note: Consumed on-disk space is a combination of virtual machine, local and remote backup consumption. However only on-disk vm-size is broken out in the SZ(G) column

Similar to the dsv-balance-show command, individual node, vm logical values and available capacity are listed

The following blog post is an excellent way to gather the same information using powerCli


Listing additional backup information for each vm (number of backups)

Issuing the command dsv-balance-manual -q will add additional backup information. Note only local backups are listed. Remote backups (backups of vm’s stored from other clusters) are not listed.

Backup count per vm is listed, if the native column is 100% this indicates that all backups are stored along with the parent vm (ensuring maximum de-duplication efficiency).

Backups can become scattered if the node running the parent vm is offline for an extended period of time (the HPE SimpliVity DVP will choose an alternative node to house the backups) however the auto balancer service should automatically re-align the backups.

Note: dsv-balance-manual does not list local or remote backup sizes

dsv-balance-manual-2

The ds-balance-manual command mainly lists logical information apart from AVAIL (G) this is a physical value. Before migrating a particular data container to another node it may be useful to understand the physical size of the data container (unique VM size) before migration. This is beyond the scope of this topic. HPE technical support can assist with calculation(s) before any such migration may be executed.

Manually migrating data containers and its associated backups to another node

  1. Issue the command dsv-balance-manual
  2. Select required Datacenter/Cluster
  3. vi the generated csv file ‘ dsv-balance-manual –csvfile /tmp/balance/replica_distribution_file_Dallas.csv’
  4. Press ‘i’ for insert mode
  5. using the arrow keys, move to the desired column of the virtual machine’s data container you wish to migrate. Note as of 3.7.6 you can either move the primary or secondary data container, for earlier versions you can only move the secondary data container.
  6. Delete the ‘p’ or ‘s’ from that column and using the arrow keys move to the column of the node you wish to migrate the data container to, insert an ‘s’ or ‘p’ character in the appropriate column.
  7. Once complete save you changes to the csv file and exit buy pressing ‘ESC’ and ‘:wq!’
  8. To migrate the data container enter the command ‘dsv-balance-manual –csvfile /tmp/balance/replica_distribution_file_Dallas.csv

The commands re-reads the csv file looking for any changes in the file, if found it will automatically migrate the appropriate container and it associated backups to the desired node.

I have created a short video below outlining this process. In this video I move the primary data container (and associated backups) of the first VM in the generated csv file from node 3 to node 2.  If you have any questions in regards to this process I recommend contacting HPE technical support who can walk you through these steps.

In the next post we will explore how to list and calculate local and remote backup sizes.

How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 4 – Automatic capacity management

If you have been following this blog series, you know that I’m exploring HPE SimpliVity architecture at a detailed level, to help IT administrators understand VM data storage and management in HPE hyperconverged clusters.

Part 1 covered virtual machine storage and management in an HPE SimpliVity Cluster.  Part 2 covered the Intelligent Workload Optimizer (IWO) and how it automatically manages VM data in different scenarios. Part 3 followed on, outlining options available for provisioning VMs.


This post covers how the HPE SimpliVity platform automatically manages virtual machines and their associated data containers after initial provisioning, as VM’s grow (or shrink) in size.

With regards the automatic management of Data Containers, part two mainly (but not exclusively) focused on the initial placement of these Data Containers. Lets call this “day one” provisioning operations of virtual machines within a HPE SimpliVity cluster.

So how does the HPE SimpliVity platform manage virtual machines and their associated Data Containers after day one operations and as virtual machines grow (or shrink) in size?

ezgif.com-video-to-gif

For a large part, this is handled by the Auto Balancer service. The Auto Balancer service is a separate service to IWO and the Resource Balancer service, however its ultimate goal is the same; to keep resources as balanced as possible within the cluster.

At the risk of repeating myself, think of IWO and the Resource Balance responsible for the provisioning of workloads (VM and associated Data Container placement) and think of Auto Balancer responsible for management of these Data Containers as they evolve in regards to overall node consumption. IWO will be aware of any changes Auto Balancer may implement and will update DRS affinity rules accordingly.


How does Auto Balancer work ?

I previously talked about how Resource Balancer will migrate Data Containers (for VDI workloads) to balance load across nodes. In its current iteration Auto Balancer takes this one step further and will migrate secondary Data Containers for all other VM types and associated backups to less utilized nodes should a node be running low on physical capacity. Auto Balancer does not operate until a node is at 50% utilization (in terms of capacity). As with IWO and Resource Balancer, Auto Balancer is also designed to be zero touch, i.e. the process is handled automatically.

Low physical capacity on a node can be a result of growth of one or more VM’s, backups or the provisioning of more virtual machines into a HPE SimpliVity cluster.

auto-2

In the above illustration VM-3’s on-disk data has grown by an amount that has in-turn caused Node 3 to become space constrained. Auto Balancer will take a pro-active decision to re-balance data across the cluster in-order to try and achieve optimum distribution of data in terms of space and IOPS. In this simplified example Auto Balancer has elected to migrate the secondary copy of VM-2 to Node 1 to keep the overall cluster balanced. Again this process is invisible to the user.

Lets take a second example to re-enforce what we have learned over the two previous posts, in regards to DRS, IWO and Auto Balancing of resources.

The above illustration does not scale well when representing multiple nodes and virtual machines. It is easier to represent virtual machines and nodes in table format (below); this format will also prove useful in upcoming posts where we’ll  learn how to view data distribution across nodes for individual VM’s and backups and how to manually balance this data if required.

distribution

For the sake of simplicity, the total physical capacity available to each node in the above table is 2TB. The physical OS space consumed after de-duplication and compression for each VM is listed.  For this example we will omit backup consumption. Therefore we know the following.

  • Total cluster capacity is 8TB
  • Total consumed cluster capacity is 2.1TB ((50 + 300 + 400 + 300) x 2, allowing for HA)
  • Node 3 is currently the most utilized node, consuming 1TB of space
  • Currently the cluster is relatively balanced with no nodes space constrained
  • DRS is chosen not to run any workload Node 4

Now let’s imagine it’s consumed space grows by 200GB and CPU and Memory consumption also increases.

We now know the following

  • Total cluster capacity is 8TB
  • Total consumed cluster capacity is 2.5TB ((50 + 300 + 400 + 300) x 2, allowing for HA)
  • Node 3 is currently over utilized, consuming 1.2TB of space.

An automatic redistribution of resources could move data such that it matches the table below.

  • DRS has chosen to run VM-4 on Node 4 due constrained CPU and Memory on Node 3, thus promoting VM-4’s standby Data Container to the primary Data Container.
  • The Auto Balancer service has migrated the secondary copy of VM-3 to node 1 to re-balance cluster resources.
distribution-1

It is worth noting that other scenarios are equally as valid. i.e. VM-4 secondary data container could also have been migrated to node 1 for example (after the DRS move), which would of resulted in roughly the same re-distribution of data.

In my previous post we talked about capacity alarms being generated at 80% space consumption on an individual. In the above scenario, no alarms would of been generated on any node within the cluster, Auto Balancer redistribute workload according to its algorithms to try an avoid that exact alarm.


Monitoring Placement Decisions

auto balancer-1

The Auto Balancer service runs on each node. One node in a cluster is chosen as a leader

The leader may change over time or as nodes are added/removed from the cluster.

The Auto Balancer leader will submit tasks to other nodes to perform Data Container or Backup migrations.

The command “dsv-balance-show –shownodeIP command shows the current leader

dsv-balance-show
Here the current leader is OVC 10.20.4.186

The log file balancerstatus.txt shows submitted tasks and status on the leader node. i.e this txt file is only present from the leader node. Issue the command “cat /var/svtfs/0/log/balancerstatus.txt” to view the status of any active migrations

balancer status

The output of this files shows the migration of backup from Node 1 to Node 2.

Historic migrations can be view by issuing the following commands

“dsv-active-tasks-show | grep migrate” should show active backup or hive migrations

“dsv-tasks-show | grep migrate” shows active and completed migration tasks

Currently the Auto Balancer does not support the migration of remote backups

Another handy command is ‘dsv-balance-migration-show – showVMName’ which was introduced in version 3.7.7. This is cluster wide command so it can be run from any node in the cluster. This will list the Virtual Machine being migrated along with the host it has been migrated from and too.


Closing Thoughts

Intelligent Workload Optimizer and Auto Balancer is not a magic wand; it balances existing resources, but cannot create resources in an overloaded DC. In some scenarios, manual balancing may be required. The next post will explore some of these scenarios, how to analyze resource utilization and manually balance resources with a cluster if required.

auto balancer-1

How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 2 – Automatic Management

In my previous blog post my aim was to paint a picture of how data, that makes up the contents of a virtual machine is stored within a HPE SimpliVity Cluster. I introduced the concept of both primary and secondary data containers, their creation process, placement, and how we report on physical cluster capacity.

If you are unfamiliar with any of thees concepts, I suggest reading that post here before continuing 🙂

Now that we have a greater understanding of how virtual machine data is stored we can start to dig a little deeper to understand how the DVP automatically manages this data, and as administrators, how we can manage this data as the environment grows.

There is a lot to cover in this topic and in the interest of trying keeping this post concise and building on core concepts incrementally I will concentrate automatic management of data via IWO. I will focus on other automatic data management features (via Auto Balancer) and manual management of data in future posts.

This post has been co-authored with my colleague Scott Harrison, a big thank you to Scott as he has provided and posed some interesting points to consider.


Automatic Management of Data

IWO, a closer look

As previously stated, a core feature of the HPE SimpliVity DVP is IWO (Intelligent Workload Optimizer).

IWO is comprised of two sub components, The Resource Balancing Service and the VMware DRS / SCVMM PRO Integration Service. For the this post I will focus on the VMware DRS integration service, however the architecture remains analogous for Hyper-V.

IWO’s aim is two fold…

1. Ensure all resources of a cluster are properly utilized.

This process is handled by the Resource Balancing Service both at initial virtual machine creation and proactively throughout the life cycle of the VM and its associated backups. This is all achieved without end-user intervention.

2. Enforce data locality.

This process is handled by the VMware DRS / SCVMM PRO integration service by pinning (through VMware DRS affinity rules or Hyper-V SCVMM PRO) a virtual machine to nodes that contain a copy of that data. Again this is all achieved without end-user intervention.

Note ! The resource balancing service is an always on service and does not rely on VMware DRS to be active and enabled on a VMware cluster, it works independently from the DRS integration service.


Ensuring all resources of a cluster are properly utilized – VM & Data Placement scenarios

The primary goal of IWO (through the underlying Resource Balancer Service, a feature of IWO) is to ensure that no single node within an HPE SimpliVity cluster is over-burdened: CPU, memory, storage capacity, or I/O’s.

The objective is not to ensure that each node experiences the same utilization across all resources or dimensions (that may be impossible), but instead, to ensure that each node has sufficient headroom to operate. In other words, each node must have enough physical storage capacity to handle expected future demand, and no single node should handle a much larger number of I/O’s relative to its cluster peers.

The Resource Balancer Service will use different optimization criteria for different scenarios. For example, initial VM placement on a new cluster (i.e., migration of existing VMs from a legacy system), best placement of a newly created VM within an existing system, Rapid Clone of an existing VM, and VDI-specific optimizations for handling linked clones.

Let’s explore the different scenarios.



Scenario #1 New VM Creation – VMware DRS enabled and set to Fully Automated

When creating or storage v-motioning a virtual machine to a HPE SimpliVity Host / Datastore vSphere DRS will automatically place the VM on a node with relatively low CPU & memory utilization according to its own algorithms (default DRS algorithms). No manual selection of a Node is necessary.

drs-auto
DRS Set to Fully Automatic – The VMware Cluster is selected as the compute resource – DRS automatically selects an ESXi server to run the VM

In the below diagram VMware DRS has chosen Nodes 3 and 4 respectively to house VM’s 1 and 2. Independently, the DVP has chosen a pair of ”least” utilized nodes within the cluster (according to Storage Capacity and I/O Demand) for the data containers of those VM’s to be placed.

iwo
VM-1 and VM-2 have been placed on nodes 1 & 2 and 3 & 4 respectively via the resource balancer service.

IWO via the DRS integration service will pin VM-1 to Node 1 & 2 and VM-2 is to Nodes 3 & 4 by automatically creating and populating DRS rules into vCenter. Lets look at how that is achieved.

How are DRS Rules Created ?

Each DRS rule consists of the following components:

  • A Virtual Machine DRS Group.
  • A Host DRS Group.
  • A Virtual Machines to Hosts rule. This consists of “must run on these hosts”, “should run on these hosts”, “must not run on these hosts”, or “should not run on these hosts”. For HPE SimpliVity we use the “should run on these hosts” rule.
DRS

In our example it is not optimal for VM-1 to be running on node 4 as all of the I/O for the VM must be forwarded to either Node 1 or Node 2 in order to be processed.  If the VM can be moved automatically to those nodes, then one hop is eliminated from the I/O path.

First a Virtual Machine DRS Group is created, as we’re looking to make a group of virtual machines that will run optimally on two nodes. In our case the name of the Virtual Machine DRS Group will be SVTV_<hostID2>_<hostID3> as we’re looking to make a group of virtual machines that will run on these two nodes.

Below we can see VM-1 assigned to this VM Group. VM-1 will share this group with other Virtual Machines that have their data located on the same nodes. Note the Host ID is a GUID and not an IP address or Hostname etc of the node, while this may appear confusing to the end user this is the actual GUID of the HPE SimpliVity node. Unfortunately mapping the GUID to the hostname or IP address of the node is not possible through the GUI and will require the command “dsv-balance-show –showNodeIP” if you do wish to identify the Node IP.

drs vm group
DRS VM Group containing VM-1
balance show
dsv-balance-show –ShowNodeIP – we can map the output of this command (node GUID) to the VM Group

Looking at the VM Group for VM-1 we can deduce that the data is stored on nodes ending in “aaf and 329” which in-turn equates to OVC .185 and .186 which in turn live on esxi nodes ending in 81 and 82 as shown below.

Again all of this is handled automatically for you, however for the post to make sense it is important to know where these values come from.

host
Identifying the Node(Host) associated with the OVC VM

You can also deduce that, as the Host DRS Groups are created, they are named SVTH_<hostID2>_<hostID3>. The host groups will only ever contain two nodes as the virtual machines only contain data on two nodes. There will be several host groups created depending on how many hosts there are in the cluster, one host group for each combination of nodes. Here I have highlighted the host group for Hosts 81 and 82 which the VM-1 will be tied to.

drs host group

Lastly, a “Virtual Machines to Hosts” rule is made, an HPE SimpliVity rule that consists of a SimpliVity host group and an HPE SimpliVity VM group. This rule directs DRS that the VM-1 “should run on” Hosts 81 and 82.

DRS affinity rules are should rules and not must rules. This is an important distinction  that we will discuss later in the post.

rule
DRS Rule “Run VMs On Hosts” containing appropriate Host Group and VM Group for VM-1

If set to Fully Automated, VMware DRS will vMotion any VM’s violating these rules to one of the data container holding nodes, thus aligning the VM with its data. In this case, VM-1 was violating the affinity rules by being placed on Node 4 and is automatically vMotioned to Node 2 via DRS.

v-motion-iwo

Scenario #2 New VM Creation  – VMware DRS Disabled

As stated previously, if VMware DRS is not enabled the Resource Balancer service continues to function and initial placement decisions continue to operate, however DRS affinity rules will not be populated into to vCenter.

drs manual
DRS Disabled – User must select a compute resource within the cluster

In this scenario when Virtual Machines are provisioned they may reside on a node where there is no data locality. A HPE SimpliVity alarm “VM Data Access Not Optimized” will be generated at the VM object layer within vCenter alerting the user.

not optimized
Data Access Not Optimized refers to a virtual machine running on a host where there is no local copy of the VM Data

The HPE SimpliVity platform through interacting with vCenter Tasks and Events will generate an event and remediation steps directing you to which nodes contain a copy of the Virtual Machine data. In the below diagram I have highlighted the “Data access is not optimized” event that directs the user to v-motion the VM to the outlined hosts.

data access not optimized
Data Access not optimized alarm directing the user to v-motion the VM to one of the outlined hosts

Rapid clone of an existing VM

We have shown how the resource balancing service behaves in regard to new VM creation, however the resource balancing service takes a different approach for HPE SimpliVity Clones and VMware Clones of Virtual Machines (VMware clones can also be handled by HPE SimpliVity via the VAAI plugin for faster operation).

In this scenario the Resource Balancer service will leave cloned VM data on the same nodes as the parent as this achieves best possible cluster-wide storage utilization & de-duplication ratios.

clone
Resource Balancer service will place clones on the same nodes as their parents

If I/O demand exceeds node capabilities the DVP will live-migrate data containers in the background to less-loaded node(s).

hive migration
Automatic Migration of Cloned Data Containers, followed by automatic v-motion of VM due to auto update of affinity rules after data container migration (nice!)

Live migration of a data container does not refer to VMware storage v-motion. It refers to the active migration of a VM Data Container to another Node.


VDI specific optimizations for handling linked clones

The scope of VDI and VDI best practices is beyond this post; however, I did want to mention how the HPE SimpliVity platform handles this scenario. More information on this topic can be found at www.hpe.com/simplivity or in this technical whitepaper: HPE Reference Architecture for VMware Horizon on HPE SimpliVity 380 Gen10.  

A single datastore per HPE SimpliVity node within a cluster is required to ensure even storage distribution across cluster members. This is less important in a two node HPE SimpliVity server configuration; however, following this best practice will ensure a smooth transition to a three (or greater) node HPE SimpliVity environment, should the environment grow over time. This best practice has been proven to deliver better storage performance and is highly encouraged for management workloads. It should be noted that this is a requirement for desktop-supporting infrastructure. VDI environments typically clone a VM, or golden image, many times. These clones (replicas) essentially become read-only templates for new desktops.

vdi-1
VDI Setup – Clone Templates from a Golden Image

As VDI desktops are deployed, linked clones are created on random hosts. Linked clones mostly read from the read only templates and write locally which causes proxying and adds extra load to the nodes that host the read only templates.

vdi-2
Deployed VDI VM’s – Mainly reads from cloned golden image causing I/O to be Proxied over network

To mitigate against this the Resource Balancer service will automatically distribute read-only master images across all nodes for even load. This aligns linked clones with their parents to ensure node-local access. It is also worth noting that Resource Balancer may also relocate links clones.

vdi-3
Linked clones automatically aligned with their parents to ensure node local Read/Writes

How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 1 – Data creation and storage

Speaking with a customer recently I was asked how data, that makes up the contents of a virtual machine is stored within a HPE SimpliVity cluster and how does this data affect overall available capacity within that cluster. This is essentially a two part question, the first part is relatively straight forward however, answering the […]