4 minutes read

POSTED Feb, 2021 dot IN Debugging

Ebbs and Flows Of DevOps Debugging PART 1

Sarjeel Yusuf

Written by Sarjeel Yusuf


Product Manager @Atlassian

 X

Ever since Patrick Debois coined the word DevOps back in 2009, teams and organizations have been clamoring to adopt relevant practices, tools, and a sense of culture in a bid to increase velocity while maintaining stability. However, this race to incorporate “DevOps” in software development practices has resulted in a perversion of the concept. This does not mean that there are no successful practices of teams adopting DevOps practices, but the word overall has become a buzzword. As per the DORA 2019 State of DevOps report, team managers are more likely to proclaim that their teams are practicing DevOps compared to the actual frontline engineers and developers.

Therefore, this piece aims to realign the meaning of DevOps as well as highlight the need for considering debugging as a core element of the practices and cultures that enable DevOps for teams. The argument for debugging as a core component in the DevOps pipeline is a result of the evident need for a left-shift in the way we build and release software, empowering developers to adhere to the intrinsic principle of you build it you run it.

DevOps Minus the Bullshit

Definitions of this ever so ubiquitous term scatter the internet providing shades of light in all colors reflecting those providing the definition. However, I believe that the best definition comes from Jez Humble who stresses that DevOps is not a set of tools or a certification of some sorts but rather a set of practices and cultural ideology of how software is ideated on, built, deployed, and maintained.

As much as the definition above is uncorrupted, it leaves abundant room for interpretation. This is intentional as no one organization, or even a single team is similar to another, and hence the definition is wide enough to cover all forms of practices and forms. Nevertheless, whatever the practices and cultural changes the team adopts, the north star remains constant which is to increase velocity while maintaining stability.

In achieving this overall objective, there are two principles that have come. The first is the principle of you build it you run it. The second is to automate everything. The first has led to what has popularly been termed as the breaking down of silos, whereas the second has led to innovative solutions aiding in moving through the DevOps pipeline.

The image above depicts the general DevOps pipeline. The stages in the pipeline usually lie within segregated silos where different teams with different roles take responsibility for these silos.

With the advent of DevOps, we now see these silos being merged with the benefits being well documented and talked about. Much of these evident benefits have been achieved with automation solutions and practices specifically tackling CI/CD and incident management processes. With better-incorporated incident managment, monitoring, and deployment tests coupled with change management, teams have definitely witnessed an undeniable improvement in the availability of their services while seeing an increase in their throughput. Going faster while maintaining stability.

However, as teams continue to improve incident managment and deployment capabilities and practices, there is eventually a point when they begin experiencing diminishing margins of return to their efforts. This is because incidents are bound to happen, and this is simply a hard-learned fact that the software industry has learned over years of pushing out revolutionary products and code behind those products. However, in the spirit of left-shift there is a segment of the DevOps pipeline that we are yet to tap, and that is debugging.

Want to subscribe to our newsletter?

Thanks for your interest. You have been subscribed!

Incident Management Only Goes So Far

With the rise of DevOps to maintain stability while increasing velocity, the need for enveloping incident managment into the DevOps pipeline became critical. Werner Vogels, Amazon VP and CTO famously once said, “Everything fails all the time”. Accepting this reality it is easy to see why incident managment becomes an integral component of a team’s DevOps practice. However, there are negative side effects on the DevOps flow of experiencing incidents, regardless of how well the incident managment is orchestrated.

Maintaining high availability is a prime goal. By reducing the time customers encounter disruptions in the experience provided. However, here is where an intrinsic problem lies. By considering availability from the customer’s perspective, it is easy to miss the ripple effect of incidents on the entire flow of going from ideation to production, the velocity aspect of DevOps.

When an incident occurs, the primary goal of responders is to resolve the incident in the quickest time possible. The general consensus is that once the disruption has been resolved and the customer begins to see normal functioning of the system once again, availability is maintained.

Nevertheless, in this hurriedness of restoring availability, shortcuts may be made to do whatever possible to resume normal functioning. The actual repairs can therefore continue way beyond the system’s normal functionality is restored. That means responders, where as per DevOps principles are members of development teams, are caught up in repairing the disruptions. As a result, tickets are created, the backlog increases, deployment queues may be held back, feature flags may be toggled, and important releases may be held up.

If development teams are spending all their time putting out fires, when will they have time to implement new features? Consequently, entire roadmaps may be delayed as resources are held up in repairing disrupted services and system components.

it can be seen that the more incidents we incur after releasing to production, the higher is the risk to the velocity of the team. The entire development cycle is susceptible. This is more true for monolith systems as compared to microservices where interdependency between services and components is high.

As Mr. Vogels stated, Incidents will happen, but he did not state how many. This is where some hope in salvaging the development pipeline still remains. The answer lies in the debugging strategies of the team. This is because, the more bugs, errors, and faults that can be captured in the development stage, the fewer incidents we can expect afterward. The need for a “left-shift” therefore becomes apparent.

Conclusion

As this ‘left-shift’ emerges in the form of improved debugging strategies, it also becomes paramount to examine what is meant by debugging. This is especially poignant when thinking of developing cloud applications. This is because developing for the cloud differs compared to developing for server applications.

This contrast becomes clearer when thinking of microservices and distributed systems where local development environments have no access to depending services or resources. Additionally, it must also be acknowledged that developing cloud applications is increasing as a convention of software.

Therefore, with the goal of improving the flow from ideation to release through the DevOps pipeline, it is imperative that debugging strategies from the viewpoint of cloud development are sought after. These strategies and practices will be discussed in part 2 of this piece.