4 minutes read

POSTED Feb, 2021 dot IN Debugging

Ebbs and Flows Of DevOps Debugging PART 2

Sarjeel Yusuf

Written by Sarjeel Yusuf

Product Manager @Atlassian


This piece is the second part of a two-part series on how focusing on cloud debugging practices can accelerate our DevOps intentions. In part 1, we explored how an increased move to the cloud pushed for the adoption of DevOps practices. However, throughout the past decade or so, since we have been looking at DevOps with the advent of the cloud, we have hit a glass ceiling in our DevOps practices. This is because we have optimized our incident management capabilities and almost fully automated our CI/CD process.

As a result, it is now time to look towards other areas of the development pipeline, mainly debugging. This is because, as seen in part 1 of this series, incidents can potentially disrupt the entire development process from ideation to release and monitoring and maintenance. Hence it becomes apparent that this entails rethinking traditional debugging practices to better fit cloud development.

Therefore, this second part aims to list some of the crucial practices and strategies that cloud development teams should consider when beginning their journey in the realm of DevOps.

Leverage Observability

As the popularity of the cloud increases, one of the major issues that are faced is the fact that developers begin dealing with black-box environments where the severity of the black-box depends on the paradigm they opt for. One of the advantages of cloud computing is that much of the underlying infrastructures’ responsibility is abstracted away to the cloud vendor. However, as more of this responsibility is abstracted away, so is the ability to know what is actually going on under the hood. As a result, it becomes challenging to identify the root cause of disruptions.

Additionally, when considering microservices and distributed systems, being able to track the flow of logic control through these various decoupled services adds to the pain points of cloud development. This is where the term observability comes into play.

Somewhat of the terms, origins can be attributed to Cory G Watson, who at that time was at Twitter and wrote his piece “Observability at Twitter”. In its current form, observability refers to three core ‘pillars’ which when orchestrated successfully, provide the necessary insights into the running of the cloud applications. These three pillars include:

  • Logs — A record of discrete events.
  • Metrics — Statistical numerical data collected and processed within time intervals.
  • Traces — A series of events mapping the path of logic taken.

These three forms of insights provide the understanding of the actual state of the intended state after the deployment. This covers over all facets of the system, including the intended UI, intended configurations, intended architecture, intended behavior, intended resources, and whatnot. Therefore, it is crucial that these three pillars constantly be referred to when developing applications, where these insights are monitored in the development environment before releasing to production.

Fallback on Traditional TDD

If the objective of the left-shift is to capture all possible disruptions in development, then the traditional Test Driven Development (TDD) can prove wonders. However, developing for the cloud varies when developing for server applications where all depending resources are available.

The contrast becomes clearer when thinking of microservices and distributed systems where local development environments have no access to depending services or resources. As a result, writing tests for cases that depend on these unavailable resources becomes difficult. This is especially true when considering the architectures that developers would gravitate towards when developing for the cloud. This can include a combination of concepts such as hybrid monoliths on the cloud, event-driven serverless configurations, active/active multi-region set-ups, and many more.

We must therefore consider various techniques that mitigate this pain point. Some of these techniques are listed below:

  • Leveraging local server plugins: There are some libraries that may be available for creating embedded servers of tools in your testing environment. For example, Maven boasts many such plugins like the one available for Redis under the Ozimov group.
  • Relying on inbuilt libraries - Some tools in your cloud stack, such as Hadoop offer development libraries such as the MRUnit library. This can be leveraged when testing for Hadoop MapReduce Jobs. As can be seen, this is an extension of the concept of using libraries, but provided from the tools themselves. Hence mitigating the fears of reaffirming test results.
  • Mocking resources - This involves replicating layers of services that would otherwise be available in production. This can be done using tools such as JUnit, where expected results from lower-level interfaces are defined. However, it must be cautioned that our own intentions of successful tests are not unintentionally replicated as we define the response of our mocked interfaces.
  • Setting up local resources - Definitely comparatively higher maintenance and cost-intensive approach as compared to the others listed above. This involves having an entire local environment with replicas of the production environment resources. This is an effective method but is hard to maintain especially when considering the potential drift in the local and production environment. There are methods to mitigate the pains of this method, mainly IaC, but at the end of the day, it fails to scale and makes the entire process susceptible to incidents. Exactly the point against DevOps.

Rely on Third-Party Cloud Support Tools

As cloud development increases in popularity, so does the ecosystem around supporting cloud developers. Cloud vendors are providing services aimed to accelerate cloud development, but are inadvertently leaving gaps in the development experience. An experience that is bound to the ability of a cloud development team, or organization developing on the cloud, to adopt DevOps practices.

As is expected, the market is growing weary of the burden of debugging cloud applications, and these issues are hampering their velocity. Tools such as Thundra are responding to these demands by providing cloud debugging and observability solutions in a bid to accelerate the cloud developer’s experience.

For example, Thundra recently released its debugging capability, aptly named Thundra Sidekick.  The new capability leverages the concept of observability mentioned earlier. It achieves this using non-intrusive debugging strategies such as non-breaking breakpoints and IDE integration support. Overall the feature allows you to test your cloud apps in both pre-production and production environments by surfacing the required insights without posing a risk in your actual code-base. It’s an effective tool as it greatly promotes the much-needed left-shift culture. You can find out more about Thundra Sidekick here.

Thundra is simply one example of a growing DevOps market ecosystem. In a report by Global Market Insights, “DevOps Market size exceeded USD 4 billion in 2019 and is poised to grow at over 20% CAGR between 2020 and 2026.”

This growth illustrates how there currently exists a market gap in the manner developers go through their DevOps practices, and how the market is responding.


Much of the era of DevOps has been focused on automating CI/CD practices to achieve greater velocity while improving incident management capabilities to increase stability. However, limitations are now becoming apparent, and we must look at other areas of our software development pipeline. Hence the need for rethinking our debugging strategy to finally break through the glass ceiling and reap the benefits that are promised with DevOps in the cloud.