In every system, there is at least one limiting factor, as outlined in the methodology known as “the Theory of Constraints.” Software development pipelines are no exception. When we look holistically at the various domains of development, it is clear that every step is interlinked, which means the constraints of each department are also shared.
Given the rise of DevOps, the importance of CI/CD, where we transition from Dev to Ops, has to be acknowledged. Constraints at this stage are disruptive to the entire development process, capable of causing issues almost anywhere. Despite often being overlooked, change management has proven to be an effective step in the development process, and it is one of the areas that can be impacted the most.
This piece will show how flaky tests, which already cause other constraints in CI/CD, also serve to undermine change management. We’ll also show you how to uncork this bottleneck and improve product development overall.
Learning to Embrace Change Management
First, some quick definitions. Incident management is the process by which teams mitigate the impact of disruptions in a production environment. By comparison, change management is about creating, scheduling, verifying, and pushing changes to a system to production. Here’s how it typically works: A change request is made, then reviewed, and eventually scheduled once the risks of the proposed change to the system have been fully assessed. At that point, it is deployed and then monitored for any adverse effects.
Some organizations overlook change management because it is seen as a barrier to velocity during deployment. It can also be cut back to its most basic elements in an attempt to keep everything moving fast, without accounting for how that may impact other parts of the process. How big an impact? Just look at the configuration change by Akamai Technologies that caused outages at HSBC and British Airways.
It makes sense that companies are hesitant to implement strong change management practices. This can entail a manual process, requiring time and input from multiple stakeholders. For some teams, it’s enough to rely on pull request (PR) approvals and then leave incident management to deal with whatever disruptions might occur.
This approach, however, is counterproductive, especially considering the new services on the market that can solve change management pain points.
CI/CD Constraints and Change Management
There are solutions available that try to improve the change management process, including Atlassian’s JSM and ServiceNow, but a look at the state of the industry shows that some businesses would rather just keep approving PR requests. Keep in mind that CI/CD is an important element of change requests.
Ideally, developers can simply create a PR request that is approved and pushed to the master codebase. If we can stick to this straightforward format, which conditions are essential for a PR to be approved and executed?
Approvers tend to look at code sanity, with special attention paid to whether the change touches direct services or dependent services. Of course, there are other aspects to consider, including programming formats, potential stylistic errors, business impact, rollback effectiveness, and more.
The most important checks for code compliance and disruptions, however, can be built into the CI phase, where we can automate most of the core function of change management by ensuring integration tests, linting checks, rigorous unit tests, and end-to-end tests. This is similar to GitOps.
There are cultural barriers to consider as well, including how tests are written and what level of confidence teams have in CI tests and builds. Lower confidence in what the CI phase produces might require more manual change management, for example.
Often, it comes down to confidence in how tests are written and run. When it’s unclear why the test succeeds or fails, confidence drops. This is the problem with flaky tests, where actual failures tend to get dismissed as simply flaky, leading to a loss of confidence, which in turn means turning to the change approvers and thus rendering the entire change management process tedious and time-consuming—which is exactly what we were trying to avoid.
That is where CI observability comes in.
Change Management Needs CI Observability
As we discussed above, confidence in test performance is critical. CI observability offers insights into how and why tests give the outcomes they do, and by improving CI using time-based and quality metrics, traces, and logs, we can:
- Create resilience in CI/CD and the DevOps pipeline
- Gain insights into the causes of failed and flaky tests
- Lower the risk of disruptions and incidents in production environments
- Build trust in the CI/CD stage using metrics that increase understanding and visibility
With CI observability practices, teams can boost confidence in the reliability of the CI/CD stage. This allows organizations to implement the core benefits of change management into their CI/CD processes and see improvements across the overall software development lifecycle.
Traditional change management is often dismissed as slow, abstract, and limited in its usefulness. However, it should not be overlooked. If we can automate the most important elements of change management into CI/CD, the benefits will be felt across the organization.
Furthermore, CI observability makes it possible to eliminate the issue of flaky tests—an essential part of achieving effective change management. Tools like Foresight makes it easy to observe CI pipelines. Check it yourself from this link.