How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 6 – Calculating backup capacity

So far in this series, we have discussed how the HPE SimpliVity platform intelligently places vm’s and their associated data containers, and how these data containers are automatically provisioned and distributed across a cluster of nodes.

I have talked about data containers with regards to virtual machines, as in, data containers are constructs for holding all data (vm config and flat files) associated with a virtual machine as illustrated below 

data-container-contents


But what about backups ? how are they stored and managed ? Firstly lets look briefly look at the backup mechanism and types….

HPE SimpliVity Backups

HPE SimpliVity backups are independent snapshots of a virtual machine’s data container (at the time of backup) and are independently stored in the HPE SimpliVity object store.

They are not mounted to the NFS datastore until needed. These backups are stored in HA pairs and are kept on the same nodes as the virtual machine’s primary and secondary data containers to maximize de-duplication of data.

backups

This post will not explain how they are logically independent as explaining the concept would require a separate post in itself and will lead to other questions in regards to capacity management I often get asked. In my next post I will look to explain how data containers are logically independent from each other which will lead us naturally into space reclamation scenario’s

HPE SimpliVity Backup Mechanism

The snapshot process is handled natively by the HPE SimpliVity platform  

  1. A point in time Data Container (Snapshot) is created locally.
  2. This is done by creating a new reference to the data container and incrementing reference counts in the Object Store of the source data container.
  3. No data writes are incurred, just metadata and index update operations.
  4. The extra reference allows for full de-duplication  

Backup Types

The HPE SimpliVity platform supports three flavors of snapshots and regardless of the backup type the end result is always the same, a HPE SimpliVity snapshot will be taken.

Crash consistent – The standard HPE SimpliVity backup. This is a snapshot of the Data Container at the time the backup is taken. This backup consists of only metadata and index update operations. No I/O to disk incurred making them nearly instantaneous.

Application consistent – This methods leverages VMware Tools to firstly quiesce the VM prior to taking a HPE SimpliVity snapshot (default for Application Consistent option). Select the Application Consistent option when opting for application consistent backups (when creating or editing a policy rule)

  • Firstly, a VMware Snapshot is taken. Outstanding I/O flushed to the disk via VMware tools commands. The VM is now running off a snapshot.
  • A Point in time HPE SimpliVity snapshot is taken of the Data Container (Standard HPE SimpliVity Backup)
vCenter Tasks highlighting the VMware Snapshot Process (Note: this process completes completes before HPE SimpliVity Snapshot is taken)

Note: The restored VM will also contain the VMware Snapshot file (as we take the HPE SimpliVity Snapshot after the VMware Snapshot is taken) thus, the VM should be reverted prior to powering on to ensure the VM is fully consistent (as any I/O performed after the VMware snapshot is was not flushed)

Remember to revert any VMware Snapshots when restoring a HPE SimpliVity Application Consistent Backup

Application consistent with Microsoft VSS – Standard HPE SimpliVity Backup with Microsoft Volume Shadow Copy Service (VSS)

MS-VSS uses application-specific writers to freeze an application and databases, and ensures that all the data transferring from one file to another is captured and consistent before the HPE SimpliVity backup is taken.

SQL Server 2008 R2, 2012, 2014 and 2016 are supported VSS applications. and Windows Server 2008 R2, 2012 R2, and 2016 are supported VSS operating systems.

Guidelines and full steps on how to enable (and restore) VSS backups are outlined in the HPE SimpliVity administration guides

Select the Microsoft VSS option when opting for application-consistent backups (when creating or editing a policy rule)

Save windows credentials to the virtual machine(s) that use the policy using the SimpliVity “Save Credentials for VSS” option. The credentials you provide should be also be added to the local machine’s administrator group.

These credentials are used to copy and run VSS scripts to freeze the application and databases before the HPE SimpliVity backup is taken.

Backup Consumption – Viewing Cluster Capacity

To view cluster capacity navigate to the Cluster object and under the Monitor tab, choose HPE SimpliVity Capacity.

Within the GUI, physical backup consumption is not split from overall physical consumption i.e. physical consumption is a combination of local, remote and virtual machine backup data.

Logical consumption is listed, you can think as logical consumption as the physical space required to store all backups with no de-duplication.

Within the GUI it is possible to view individual unique backup sizes by selecting any HPE SimpliVity backup and under backup Actions, selecting the Calculate Unique Backup Size option.

Once complete scroll to the right of the backup screen to the column – Unique Backup to view.

This method can be useful, however individual node consumption figures are not listed within the vSphere plugin / GUI. If you wish to view physical consumption information per node you will need to resort to the cli

Backup Consumption – viewing individual node utilization

The command ‘dsv-backup-size-estimate‘ can be a useful tool if you are looking to break out individual node backup consumption figures. This command will list total node on-disk consumption, local backup and remote backup on-disk consumption figures for all backups contained on the node.

The command has many options, using the –node switch will list all backups contained on the node. Using the –vm along with –datastore will allow you to specify an individual vm if required. The command can be run from any node to interrogate itself or any other node.

listed below the output when running the command using the –node switch (I shortened it to show just two virtual machines)

The bottom of the output will display node usage statistics. For OmniStackVC-10-6-7-225 in this example, total Node on-disk consumed data is 625.05 GB.

This is comprised of Data (on disk VM consumption) of 142.17 GB + Local backup consumption of 268.15 GB + Remote backup consumption of 214.73 GB

root@omnicube-ip7-203:/home/administrator@vsphere# dsv-backup-size-estimate --node OmniStackVC-10-6-7-225

.---------------------------------------------------------------------------------------------------------------------.
| Name                                          | Bytes    | Local   | Offnode | Heuristic | Backup                   |
|                                               | Written  | Size    | Size    |           | Time                     |
+-----------------------------------------------+----------+---------+---------+-----------+--------------------------+
| Windows      (TEST-DS)                        |          |         |         |           |                          |
+-----------------------------------------------+----------+---------+---------+-----------+--------------------------+
| Windows-backup-2019-06-19T11:11:46+01:00      |  12.06GB | 12.06GB |         |    2.24GB | Wed Jun 19 10:14:41 2019 |
| Windows-backup-2019-06-19T11:12:10+01:00      | 360.37KB | 12.06GB |         |   66.85KB | Wed Jun 19 10:15:06 2019 |
| Test                                          |   2.79GB | 12.06GB |         |  529.13MB | Mon Jun 24 05:07:14 2019 |
+-----------------------------------------------+----------+---------+---------+-----------+--------------------------+
| VM Total Backups: 3 of 3                      |  14.84GB | 36.18GB |      0B |    2.75GB |                          |
| = local 3 + offnode 0, unknown 0              |          |         |         |           |                          |
'-----------------------------------------------+----------+---------+---------+-----------+--------------------------'


.----------------------------------------------------------------------------------------------------------------------.
| Name                                          | Bytes    | Local    | Offnode | Heuristic | Backup                   |
|                                               | Written  | Size     | Size    |           | Time                     |
+-----------------------------------------------+----------+----------+---------+-----------+--------------------------+
| ELK_Ubutnu (DR_1TB)                           |          |          |         |           |                          |
+-----------------------------------------------+----------+----------+---------+-----------+--------------------------+
| 2019-06-20T00:00:00+01:00                     | 133.88GB | 133.88GB |         |   24.33GB | Wed Jun 19 23:00:00 2019 |
| 2019-06-21T00:00:00+01:00                     |  98.91GB | 134.27GB |         |   17.97GB | Thu Jun 20 23:00:00 2019 |
+-----------------------------------------------+----------+----------+---------+-----------+--------------------------+
| VM Total Backups: 2 of 2                      | 232.79GB | 268.15GB |      0B |   42.30GB |                          |
| = local 2 + offnode 0, unknown 0              |          |          |         |           |                          |
'-----------------------------------------------+----------+----------+---------+-----------+--------------------------'

For node OmniStackVC-10-6-7-225 (32f62c42-6b8f-8529-c6a5-9db972eee181) in cluster 'DR Cluster':
- Total Node Data:  625.05GB = Data: 142.17GB  + Local backup: 268.15GB + Remote backup: 214.73GB
- Compressed:        113.58GB, Uncompressed: 207.95GB
- Comp  ratio:       0.54617
- Dedup ratio:       0.33269
- Total number of backups: 15
- Number of backups shown: 15 = Local 15 + Offnode 0, unknown 5
- Total backup bytes:      482.88GB = local 482.88GB + offnode 0B
- Gross heuristic estimate: 59.60GB
root@omnicube-ip7-203:/home/administrator@vsphere#

Backup Consumption – viewing individual backup consumption

Focusing on the virtual machine backup consumption output above, I have chosen two very different types of virtual machines for comparison purposes.

Virtual machine ‘Windows‘ in-guest data de-duplicates very well while virtual machine ‘ELK_Ubutnu‘ contains a database workload that generates a high amount unique data, thus each individual backup will contain a larger amount of unique data. Lets compare the figures.

For each individual virtual machine 4 main columns will be listed. However for on-disk consumption purposes we only really care about the Heuristic values.

  • Bytes Written – While I have marked this as ‘Logical Bytes Written’ this is a combination of unique on-disk virtual machine size and unique backup size at time of backup. For the very first backup this will capture the unique size of the VM, however no I/O is actually incurred (hence I marked it as logical!). This can be a useful value for calculating the unique on-disk consumption of a virtual machine.
  • Local Size – Logical backup size (as exposed in the GUI)
  • Off-node Size – Weather the backup is local or remote.
  • Heuristic – An approximation of unique on-disk data associated with this backup and the VM since the time of backup. This value is cumulative from when the backup was first taken.

Virtual Machine – ‘Windows

Looking at the values for this virtual machine’s backups we can deduce the following.

  1. The approximate unique size of this VM was originally 12 GB. If we were to delete all backups associated with this virtual machine, approx 12 GB of data would still be consumed on this node.
  2. Since the first backup was taken, approx 2.75GB of unique data is associated with the virtual machine and its backups.
  3. Deleting all backups would reclaim no more than 2.75 GB of data, however as that data is shared with the virtual machine since time of backup, less could be reclaimed in reality. To obtain a truer value, locate the backup within the GUI and calculate its unique size as outlined in the further back in the post.

Virtual Machine – ELK_Ubuntu

Virtual Machine ELK_Ubuntu contains much more unique data. This is reflected via its individual and aggregate backup figures of 24GB, 17GB (Totaling 42GB)

  1. The approximate unique size of this VM was originally 133 GB. If we were to delete all backups associated with this virtual machine, approx 133 GB of data would still be consumed on this node.
  2. Since the first backup was taken, approx 42 GB of unique data is associated with the virtual machine and its backups.
  3. Deleting all backups would reclaim no more than 42 GB of data

The command dsv-backup-size-estimate can be a useful tool, however, bear in mind that it is a point in time calculation when run and should be treated as such. As data is written to and deleted from the Object Store, de-duplication rates may grown or shrink in size. As could data associated with the original.

I find these values are useful for gaining an approximation of virtual machine and associated backup consumption and from there working within the GUI to identify backups that may be starting to grow too large !

How Virtual Machine data is stored and managed within a HPE SimpliVity Cluster – Part 4 – Automatic capacity management

If you have been following this blog series, you know that I’m exploring HPE SimpliVity architecture at a detailed level, to help IT administrators understand VM data storage and management in HPE hyperconverged clusters.

Part 1 covered virtual machine storage and management in an HPE SimpliVity Cluster.  Part 2 covered the Intelligent Workload Optimizer (IWO) and how it automatically manages VM data in different scenarios. Part 3 followed on, outlining options available for provisioning VMs.


This post covers how the HPE SimpliVity platform automatically manages virtual machines and their associated data containers after initial provisioning, as VM’s grow (or shrink) in size.

With regards the automatic management of Data Containers, part two mainly (but not exclusively) focused on the initial placement of these Data Containers. Lets call this “day one” provisioning operations of virtual machines within a HPE SimpliVity cluster.

So how does the HPE SimpliVity platform manage virtual machines and their associated Data Containers after day one operations and as virtual machines grow (or shrink) in size?

ezgif.com-video-to-gif

For a large part, this is handled by the Auto Balancer service. The Auto Balancer service is a separate service to IWO and the Resource Balancer service, however its ultimate goal is the same; to keep resources as balanced as possible within the cluster.

At the risk of repeating myself, think of IWO and the Resource Balance responsible for the provisioning of workloads (VM and associated Data Container placement) and think of Auto Balancer responsible for management of these Data Containers as they evolve in regards to overall node consumption. IWO will be aware of any changes Auto Balancer may implement and will update DRS affinity rules accordingly.


How does Auto Balancer work ?

I previously talked about how Resource Balancer will migrate Data Containers (for VDI workloads) to balance load across nodes. In its current iteration Auto Balancer takes this one step further and will migrate secondary Data Containers for all other VM types and associated backups to less utilized nodes should a node be running low on physical capacity. Auto Balancer does not operate until a node is at 50% utilization (in terms of capacity). As with IWO and Resource Balancer, Auto Balancer is also designed to be zero touch, i.e. the process is handled automatically.

Low physical capacity on a node can be a result of growth of one or more VM’s, backups or the provisioning of more virtual machines into a HPE SimpliVity cluster.

auto-2

In the above illustration VM-3’s on-disk data has grown by an amount that has in-turn caused Node 3 to become space constrained. Auto Balancer will take a pro-active decision to re-balance data across the cluster in-order to try and achieve optimum distribution of data in terms of space and IOPS. In this simplified example Auto Balancer has elected to migrate the secondary copy of VM-2 to Node 1 to keep the overall cluster balanced. Again this process is invisible to the user.

Lets take a second example to re-enforce what we have learned over the two previous posts, in regards to DRS, IWO and Auto Balancing of resources.

The above illustration does not scale well when representing multiple nodes and virtual machines. It is easier to represent virtual machines and nodes in table format (below); this format will also prove useful in upcoming posts where we’ll  learn how to view data distribution across nodes for individual VM’s and backups and how to manually balance this data if required.

distribution

For the sake of simplicity, the total physical capacity available to each node in the above table is 2TB. The physical OS space consumed after de-duplication and compression for each VM is listed.  For this example we will omit backup consumption. Therefore we know the following.

  • Total cluster capacity is 8TB
  • Total consumed cluster capacity is 2.1TB ((50 + 300 + 400 + 300) x 2, allowing for HA)
  • Node 3 is currently the most utilized node, consuming 1TB of space
  • Currently the cluster is relatively balanced with no nodes space constrained
  • DRS is chosen not to run any workload Node 4

Now let’s imagine it’s consumed space grows by 200GB and CPU and Memory consumption also increases.

We now know the following

  • Total cluster capacity is 8TB
  • Total consumed cluster capacity is 2.5TB ((50 + 300 + 400 + 300) x 2, allowing for HA)
  • Node 3 is currently over utilized, consuming 1.2TB of space.

An automatic redistribution of resources could move data such that it matches the table below.

  • DRS has chosen to run VM-4 on Node 4 due constrained CPU and Memory on Node 3, thus promoting VM-4’s standby Data Container to the primary Data Container.
  • The Auto Balancer service has migrated the secondary copy of VM-3 to node 1 to re-balance cluster resources.
distribution-1

It is worth noting that other scenarios are equally as valid. i.e. VM-4 secondary data container could also have been migrated to node 1 for example (after the DRS move), which would of resulted in roughly the same re-distribution of data.

In my previous post we talked about capacity alarms being generated at 80% space consumption on an individual. In the above scenario, no alarms would of been generated on any node within the cluster, Auto Balancer redistribute workload according to its algorithms to try an avoid that exact alarm.


Monitoring Placement Decisions

auto balancer-1

The Auto Balancer service runs on each node. One node in a cluster is chosen as a leader

The leader may change over time or as nodes are added/removed from the cluster.

The Auto Balancer leader will submit tasks to other nodes to perform Data Container or Backup migrations.

The command “dsv-balance-show –shownodeIP command shows the current leader

dsv-balance-show
Here the current leader is OVC 10.20.4.186

The log file balancerstatus.txt shows submitted tasks and status on the leader node. i.e this txt file is only present from the leader node. Issue the command “cat /var/svtfs/0/log/balancerstatus.txt” to view the status of any active migrations

balancer status

The output of this files shows the migration of backup from Node 1 to Node 2.

Historic migrations can be view by issuing the following commands

“dsv-active-tasks-show | grep migrate” should show active backup or hive migrations

“dsv-tasks-show | grep migrate” shows active and completed migration tasks

Currently the Auto Balancer does not support the migration of remote backups

Another handy command is ‘dsv-balance-migration-show – showVMName’ which was introduced in version 3.7.7. This is cluster wide command so it can be run from any node in the cluster. This will list the Virtual Machine being migrated along with the host it has been migrated from and too.


Closing Thoughts

Intelligent Workload Optimizer and Auto Balancer is not a magic wand; it balances existing resources, but cannot create resources in an overloaded DC. In some scenarios, manual balancing may be required. The next post will explore some of these scenarios, how to analyze resource utilization and manually balance resources with a cluster if required.

auto balancer-1