Customers Matter when Upgrading OpenStack

OpenStack is one of the most effective ways to provide private cloud services to your users, but it also requires that you keep your deployment updated. Since OpenStack releases on a six month cadence, operators need to plan for twice a year upgrades. Falling behind on OpenStack upgrades means that bug fixes and new features are missed and just makes future upgrades more and more difficult. Therefore, when planning your OpenStack deployment it is key that you consider upgrades as a required process, and observe these customer-friendly considerations.

First and foremost, upgrades should not be disruptive to customers. Before planning the upgrade process, consider the tolerance for downtime as reflected in two key Service Level Agreements (SLAs): API uptime/responsiveness and Instance uptime/responsiveness.  Some applications are more sensitive to API downtime while most are very sensitive to instance downtime as few applications behave well when their instances are unreachable.  The SLAs and user tolerance for these will shape your upgrade process, how often you do it, and when you do it.

One way to reduce the disruption to customers is to automate the upgrade process as much as possible. Although automation initially will take more engineering resources, it will pay dividends every time you use it. Automation can make upgrades quicker and reduce the error rate, especially during late night upgrade windows when everyone is tired. In addition to automation, a playbook of the process should be created, This document describes the steps being taken during each phase of the automation and more critically how to recover from failure at different points. Failure recovery is not something to figure out in the middle of an OpenStack upgrade.

Developing the automation, playbooks, and failure recovery process requires practice and testing. Test environments on which to test upgrades and practice recovery from errors are a necessity. OpenStack can change a lot between releases and the automation and playbooks will require maintenance and improvements. Having rebuildable test environments will provide a test bed for these necessary changes. Test environments will also allow for testing of data migrations. Since OpenStack data migrations are the cause of most problems during upgrades, ensuring that production data migrations will work is key.

Finally when planning for OpenStack upgrades, don’t forget to include supporting components. Software like databases and message queue systems, used by OpenStack, will need to be upgraded. Since these may be disruptive as well, they should be planned outside your OpenStack upgrade when possible. Like OpenStack, automating and testing these upgrades will reduce risk to customers.

At IBB, our Cloud and Software Transformation group has real-world experience upgrading OpenStack clouds at a number of companies, automating the process and reducing risk to your customers. Let us know how we can help you.



Get in touch