As DevOps takes grip of the software industry, an inevitable left-shift is noted under the philosophy of “you build it you run it”. It is this philosophy we now see emerging in all domains sitting at the right end of the software development spectrum such as monitoring and alerting.
The push towards DevOps is resulting in the breakdown of separate silos, and we are now seeing more proactive involvement of development teams in all the areas. As a result, practices, and tools are emerging to enable these development teams to take control of the operations of their application systems within the environments that they are familiar with.
One such concept that has emerged is Dashboards as Code (DaC) which extends upon the concept of Infrastructure as Code (IaC), transforming the way monitoring and observability are instrumented. The process of building monitoring dashboards, which surface the state of the system and infrastructure to comprehensible levels, is something that has traditionally been operated using GUIs provided by the monitoring service provider.
However, the push towards automation, being fulled by the rise in DevOps, is leading to the transformation in the way monitoring and alerting tools are being incorporated into the system. Instead of using the user interface with all the frills of drag-and-drop and single-click integrations, we are seeing an increase in code defined configurations. “Why, is that not more complex?” you may ask. Well, apparently not.
It is this question that this piece aims to explore. To understand why the currently popular approach of setting up monitoring and alerting dashboards is no longer meeting our transforming needs. To shine a light on the new way of doing things in the realm of DevOps and what that means for monitoring and alerting.
The Basics of Observability
Observability essentially enables development teams to understand the state of their systems. This concept is more relevant now than before. Mainly because of the cloud movement where major sections of the underlying architecture are being delegated to cloud providers. Hence allowing for development teams to focus primarily on the business logic.
Abstracting underlying architecture definitely has its benefits, but it also has a major drawback: The lack of observability. As you give up control of your compute services, it also becomes harder to understand the state of your system. The more you give up the greater the black-box effect.
FaaS services are the most susceptible to this effect since they involve the greatest level of abstraction.
Observability effectively solves the black box problem, by providing insights in three forms as defined below:
- Logs - A record of discrete events.
- Metrics - Statistical numerical data collected and processed within time intervals.
- Traces - A series of events mapping the path of logic taken.
These three forms of insights are visualized and acted upon thanks to the monitoring and alerting tools that are incorporated into the system. These tools in today’s day and age are rarely in-house solutions and are achieved using third-party SaaS tools.
These tools provide a massive range of integration opportunities. For example, the popular alerting and incident management tool Opsgenie has over 200 integrations. Hence allowing for a wide range of use-cases that satisfy the needs of consumers when it comes to alert consolidation, routing, and incident management.
Similarly, these monitoring and alerting tools have focused on the ease of integration and set up of monitoring dashboards to visualize insights. Thundra itself has one of the easiest configurations, allowing you to instrument monitoring capabilities for your cloud infrastructure.
Hence we see the flexibility, power, and ease of use required to rely solely on third-party SaaS tools. So it then begs the questions of what is lacking and why we are now seeing a shift towards a more IaC approach.
IaC (Infrastructure as code), is the practice whereby resources are provisioned via programmable defined scripts as compared to using management consoles to manually create environments of resources.
These scripts are machine-readable and allow for automatic deployment of the resources along with the required connections and interaction configurations between the resources. This is because IaC tools are expected to handle networks, virtual machines, load balancers, and connection topology. Moreover, every time an IaC script is applied it will always result in the same environment as described in the script.
Hence the benefits here become apparent. IaC is a common practice in DevOps as the goal of DevOps is to achieve automation of the software production process. This is because, with IaC, we are able to automate the building of our infrastructure, which is even more crucial in cloud environments. Even though cloud environments abstract a lot of the underlying architecture from developers, they require tedious configurations of individual cloud resources under the constraints of the resource vendor. Therefore IaC services such as AWS Cloudformation have always provided some form of respite from the need for repetitive configurations.
Further benefits include state idempotency and templating. Since IaC allows us to model our infrastructure in a script based format, we can define the desired state of our cloud infrastructure. Therefore, if our infrastructure deviates from the desired state too much, we can automate its recovery using the IaC template.
Similarly, we can use the same template to replicate the desired state in multiple environments. This is extremely advantageous for testing purposes as we would like to mimic real-life scenarios. Hence instead of having to arduously configure each component to mirror the infrastructure to be tested, we can simply automate the provisioning of an identical infrastructure, followed by automated testing facilitated by CI/CD tools.
Benefits of IaC for Dashboards
Therefore considering the benefits of IaC it is understandable why industry practices are slowly shifting the configuration of monitoring and alerting tools to defined programmable statements. This is because the same issues that apply to setting up compute resources and other facets of the system also apply to the domain of monitoring and alerting.
By configuring the required dashboards to visualize all necessary metrics, logs, and traces using in the form of code, we can achieve more robust monitoring and alerting infrastructures. These configurations can always be changed, edited, and deployed when required to ensure that the monitoring and alerting needs transform with the services on which observability is enabled.
Versioning of monitoring dashboards is more convenient in tandem with versioning of the code base and hence the needs of the services. In normal circumstances, these dashboards are updated manually after each build according to the new metrics that may need to be measured and the manner in which they would have to be visualized.
This increases the risk of the drift of monitoring and alerting as several changes take place. Additionally considering the number of changes that can now be achievable thanks to the rise of concepts such as microservices and automated CI/CD practices under the syncope of DevOps, organizations are more susceptible to the drift. Of course, teams can always be aware of changes and how it impacts their monitoring and alerting needs, but this would require substantial operations efforts. A counter-intuitive approach to the benefits that DevOps promises.
Therefore, by practicing dashboards as code, teams can incorporate the operations of these dashboards into their development practices, allowing for a more consolidated approach to building production-grade software. This also allows developers to operate in more familiar environments, defining dashboards within programmable scripts. Hence enabling development teams to follow the philosophy of you run it you build it.
As software moves towards breaking down the silos that exist between production pipelines, the industry is looking towards DevOps practices and tools to achieve the ideal solution. In our endeavor to bring operations closer to development, locking down on Dev + Ops, we see a left shift. Bringing operations closer to the development areas of the pipeline.
As a result one of the most prominent concepts to employ when aiming to achieve this is IaC. Thus providing a platform to familiarize operations for developers in the environment that they are best acquainted with. Therefore, when looking at IaC as a solution to breaking down the silos, all of these sections of the pipelines should be considered, including monitoring and alerting.
Acknowledging that these monitoring and alerting capabilities are enabled through rich dashboards that surface the pillars of observability, it is easy to see why this dashboard culture needs to be adopted in the form of IaC. Hence dashboards as code, where the dashboards themselves, along with all underlying monitoring and alerting infrastructure are configured via programmable scripts.
As a result, we not only bring dashboards closer to the development with the left-shift but also allow versioning and auditing of these dashboards. Hence allowing software teams to apply the benefits of IaC that they have conventionally enjoyed in system architecture, in the realm of observability.