As the famous saying goes, “reach for the stars”, and who better to embrace this ideology than the tech industry that for the past few decades has redefined innovation. We may not have reached the stars yet, but we definitely do see a shift to the cloud.
Considering its humble beginnings, the question in today’s day and age is not “should we migrate to the cloud?”, but how? This trend is reflected in various reports, community discussions, and literature on the matter. One such example is the Flexera 2020 State of the Cloud Report which provides comprehensive insights into technological trends reflecting the increase in cloud platforms and services.
Moreover, from the same report, it is also seen that AWS was the most popular cloud vendor of choice. So it must be acknowledged that tech is moving to the cloud, and AWS is leading this new wave of.
Therefore the transition is inevitable, but it is not always a smooth transition. On the way, there are high barriers that may hamper the intended success. One of the most major barriers is the loss of full awareness of what is going on with your systems in the cloud, and that is where the need for observability becomes apparent.
Therefore, this article will describe what observability in the cloud means and how it eases the transition to the cloud. Additionally, considering that AWS is the most popular vendor of choice, we will go over some of the native solutions that AWS provides, the limitations of these offerings, and how third-party tools such as Thundra can help.
Darkness in the Clouds
Moving to the cloud definitely has its benefits. Since the cloud movement began, these benefits have exhaustively been documented and echoed throughout the industry, thus fuelling the intentions to make the move.
Unfortunately, these benefits are accompanied by a great disadvantage, the loss of awareness in how your systems are running. In a time when running systems on local servers was the norm, owning the entire platform also meant that being familiar with every corner of the servers and local platforms. How the system’s computational resources perform in different cases was something that was well known, along with security risks and application performance. On the contrary, moving to the cloud meant entrusting compute resources to the vendor, thereby abstracting the underlying infrastructure. This is where the term Platforms as a Service (PaaS) originates from. It literally refers to the fact that you are now adopting the underlying architecture, the platform on which to run your software application, as a service from a vendor.
As a result, the desire to reap the well-known benefits of the cloud also meant conceding awareness of the system state. What actually is going on under the hood of the PaaS model that you opted for. This hence resulted in a black box environment for all those building in the cloud, acting as a reality check to the dream that was chased by trusting cloud vendors and their PaaS offerings.
The Serverless Conundrum
Considering the fact that delegating architecture to cloud vendors leads to a black-box, the issue is further exasperated in the case of serverless. This is because serverless is the PaaS model that results in the most abstraction of the underlying architecture, and this is not done unknowingly.
After all the greatest benefit of serverless is that we no longer have to worry about running our software applications, and can now primarily focus on business logic. Such an advantage can only be procured thanks to the fact that serverless is fully managed, automatically scalable, and lauds the pay-as-you-go model. In comparison, containers provide limited automated scalability at their best, require complex orchestration, and are always being charged regardless of their use.
Acknowledging these issues is what raised, and still validates today, the need for observability. Today the concept is well defined, and mostly all APM tools in the market have based their offerings around the basic principles of observability. Emrah Samdan, VP of Thundra, is one such personality in the world of serverless that has talked extensively on the topic.
In a nutshell, observability is the light in the black box environment described above. Adopting observability means tracking metrics, sifting through traces, and recording log data. These three concepts considered the defining pillars of observability, bring much-needed insights into the dreaded black box environment inherent to the cloud. Thus ensuring the betterment of development of your applications, making the shift much smoother. So what does AWS, the most popular cloud vendor have to offer in regards to this?
AWS Serverless Observability, Its Might, and Its Limitations
Cloud vendors are not oblivious to the perils of the cloud. Observability is a concern that all vendors are addressing. AWS being the most popular vendor is definitely not lagging behind by all means.
AWS provides two primary tools enabling its user in terms of observability. The first, and by far more popular tool is Amazon CloudWatch. The second and relatively newer service is AWS X-Ray. Both tools are powerful within their scopes and solve the fundamental problems related to the lack of observability in a cloud environment. However, both tools, even when used in conjunction with one another, also have their limitations which often results in a void of information especially for mature adopters of the cloud.
Amazon CloudWatch is a log-based monitoring service that, according to its product description aims to capture “data and actionable insights for AWS, hybrid, and on-premises applications and infrastructure resources”. That being said, in terms of serverless applications, running AWS’ serverless Lambda service at the core, the service does justice in capturing the following metrics:
- Invocations
- Error Logs
- Throttles
- Invocation Durations
Nevertheless, in the domain of serverless applications, metrics alone do not provide meaningful clarity. The rise of serverless has also resulted in a shift in how software is being built, pushing towards distributed systems. Therefore, advocating for new ways to look at monitoring in the form of distributed traces.
This is where AWS X-Ray comes into the picture, providing a native distributed trace solution. The service captures data pertaining to your application’s architecture and allows you to visualize the overall system within a service map.
So it can be seen that both AWS services provide some form of observability, easing the pressures of building in the cloud. However, both services also have their limitations, with the first being scalability, and this is mainly due to the way AWS X-Ray and CloudWatch are priced.
Additionally, Amazon CloudWatch is log-based, and provides great power in terms of logging, but not in terms of metric collection. The out-of-box metrics recorded are limited to the ones listed above. If the user would like to monitor metrics around costs and cold starters, for example, they would have to perform additional complex metric filter configurations.
Similarly, AWS X-Ray is trace-centric and therefore cannot provide an overall view in terms of CPU and memory metrics. It also does not provide alerting capabilities, and it is not entirely possible to group information recorded by AWS X-Ray. Furthermore, within the domain of traces, the service is limited in the sense it cannot perform trace monitoring over async distributed traces.
One thing that becomes apparent in considering the limitations of both services is that they complement each other. It is definite that to gain greater observability both AWS solutions should be used in tandem. That, however, opens up newer issues such as increased cost since both solutions are being operated, and the incohesive experience as we constantly switch between the two. To further despair, it is difficult to correlate data. Logs in Amazon CloudWatch cannot exactly be mapped to traces in AWS X-Ray.
It is thus evident that both Amazon CloudWatch and AWS X-Ray are powerful tools, but fall short in providing the full needs of observability. The lights have been turned on within the black box which is the cloud, but unfortunately, the light burns dim. This thus calls for the need for third-party monitoring tools such as Thundra to add the required observability capabilities, completely lighting up the cloud.
Where the Light Doesn’t Reach, Thundra Is There
AWS is definitely a powerful cloud vendor, and its solutions regarding observability are definitely a blessing. However, there are some limitations and cases that are not covered by these solutions, and this is where Thundra can be utilized.
Providing a Birds-Eye View
One of the most basic and important requirements when building your cloud architecture is the need to understand how everything fits together. This is provided by the architecture view, which AWS X-Ray aims to achieve with its service graph. However, AWS X-Ray is limited to distributed traces within the AWS ecosystem and the information displayed is minimal. In fact, even within the ecosystem itself, there are limitations such as the inability to span over API gateway. This is where Thundra’s architecture view can provide further insights.
Thundra’s architecture view provides a full overview of the system built from its powerful distributed trace monitoring capability. Furthermore, from the trace itself, we can dive into relevant metrics and invocations. This provides the necessary deep dive from the highest point of visibility to the most granular and specific information.
Alerting is Key
As noted, AWS X-Ray does not provide alerting capabilities. On the other hand, Amazon CloudWatch does. However, the alerting capability is limited to some extent and when configuring alerts for metrics apart from the four out-of-the-box metrics mentioned above, things get very complex very fast.
Thundra provides an easy to use, but the sophisticated, alerting mechanism. You can build any query using Thundra’s query builder and set it as an alert trigger when your Lambda function behavior meets these query conditions.
Moreover, Thundra alerts can be sent via various means from email all the way to slack and incident management tools such as Atlassian’s Opsgenie.
Debugging on the Spot
Both AWS native solutions provide the basic pillars of observability when used together. However, Thundra allows us to go one step further by actually peering into the hood of your serverless system. This is achieved by Thundra’s debugging capabilities, available within both development mode and production mode.
One of the aspects that we would like to explore is the internal working of the functions we write and how data is being passed from one function to another and how it is being processed within the function. CloudWatch and Thundra both allow you to see payloads being passed between functions, however, Thundra also allows you to actually check the control flow in regards to how your function code runs in production.
Additionally, you can leverage Thundra in your IDE as a debugging tool to accompany you in the development process of your serverless applications. This is a domain that has seen several improvements, especially since the release of AWS SAM. However, Thundra empowers developers to a whole new level, providing all the observability required to remove all barriers to development.
Account for Costs
When developers building serverless functions look towards observability, the major objective to shine a light on performance issues. However, one of the other integral issues that observability can solve is tracking the costs of your cloud operations.
One of the promises with serverless is the pay-as-you-go model. That means that we only have to pay for the number of resources we have to use. This means potentially cheaper costs unless traffic increases in your Lambda function leading to an excess amount of invocations. So it becomes necessary to track the expenses of your Lambda functions to avoid run-away costs.
It must be realized that even though AWS CloudWatch can monitor crucial metrics out of the box, it cannot measure cost-related metrics without the complex configurations mentioned above. This is where Thundra has the edge, by monitoring the costs of your invocations directly.
Conclusion
Therefore, it is acknowledged that the transitioning the cloud can no longer be avoided. Moreover, considering the latest trends in cloud development, serverless is slowly taking center-stage. Unfortunately, this has put developers between a rock and a hard place with the lack of observability in serverless environments.
Cloud vendors such as AWS are aware of the situation and are providing solutions to mitigate these concerns. However, these solutions are not ideal considering the pillars of observability and the needs of serverless developers. This is where Thundra comes into play, by providing powerful capabilities solving all the woes around observability, breaking down barriers to serverless adoption and cloud migration.
It must be noted that the capabilities listed above are just a small set of Thundra’s full power of solutions. From unique business traces and anomaly detection to sophisticated querying and log investigation, Thundra provides the full flex developers need to successfully build in the cloud. After all, Thundra is fully committed to making the cloud possible.