Planning for Failure with DRS Groups

Virtual Machine DRS Groups

Following our story as an example, let’s say that we have a typical three-tiered application. There’s a web tier, app tier, and database tier. Each tier, for simplicity’s sake, has two VMs each.

Planning for Failure with DRS Groups - Dee Abson - Image03

We’re all used to using VM affinity rules to keep VMs together or apart from each other. In our example we’re often tempted to use VM-VM anti-affinity rules to keep the A nodes away from the B nodes. As our story illustrated, however, this doesn’t guarantee that the VMs will be distributed along our physical row boundaries.

Conceptually, not only are we interested in keeping the A nodes away from the B nodes to allow the application to survive a host failure, but we now need to make sure the application can survive a row failure. Lets group our VMs along the A and B nodes into two virtual machine DRS groups.

Planning for Failure with DRS Groups - Dee Abson - Image04

Now we have host DRS groups that logically group hosts along our physical row boundaries, and we have virtual machine DRS groups that logically group VMs along our node boundaries. How do we tie them together?

Like Peanut Butter and Chocolate

Turns out that our two different sets of group definitions are very complementary. Let’s chose to associate our VM DRS Group A with Host DRS Group 1, and VM DRS Group B with Host DRS Group 2. Logically it looks something like this.

Planning for Failure with DRS Groups - Dee Abson - Image05

In our DRS settings, we simply create rules that match our choices:

  • VM DRS Group A must run on hosts in Host DRS Group 1.
  • VM DRS Group B must run on hosts in Host DRS Group 2.

DRS will now place the VMs according to our rules.

What will happen now if we have a row failure? We’ll only lose one node out of each application tier, meaning the application stands a good chance of staying up. How about our original concern if we have a host failure? Because the rules we’ve created keep the nodes on different groups of hosts, they also make sure that the nodes don’t run on the same host at the same time.

Late night alerts are now a lot less likely to cause you grief the next morning.

Featured image photo by Librarian Avenger

Dee Abson

Dee Abson is a technical architect from Alberta, Canada. He's been working in the field of technology for over 20 years and specializes in server and virtualization infrastructure. Working with VMware products since ESX 2, he holds several VMware certifications. He was awarded VMware vExpert for 2014-2019. You can find him on Twitter.

You may also like...

10 Responses

  1. @xinity_bot says:

    Planning for Failure with DRS Groups https://t.co/vcOvk7Ruo2 #General #Design

  2. RT @deeabson: New Post: Planning for Failure with DRS Groups https://t.co/eCEG3D4yfV #design #vExpert https://t.co/8wYRMrqi0W

  3. @PlanetV12n says:

    Planning for Failure with DRS Groups https://t.co/H2UFaIL4vO

  4. @tbdorg says:

    ICYMI: Planning for Failure with DRS Groups https://t.co/CvBnz1nH0c ##design https://t.co/tykifexLiu

  5. @forgetmebot says:

    Never Forget: Planning for Failure with DRS Groups https://t.co/yoNu0EUyNV #design #NeverForget

  6. @deeabson says:

    ICYMI: Planning for Failure with DRS Groups https://t.co/gkHTdVBMRs #design #vExpert

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: