5 minutes read

POSTED Jan, 2020 dot IN Distributed Tracing

Why Is End-to-End Tracing Important For Serverless

Emrah Samdan

Written by Emrah Samdan


VP of Product @Thundra

 X

 

the-importance-of-end-to-end-tracing

Serverless has become extremely popular with software companies, and has become one of the fastest-growing cloud service models. According to forecasts, its market will reach $22 billion by 2025. While serverless has ushered in a new era of technological advancement, it has also brought new challenges. Application debugging and monitoring, for example, have become much more complicated.

From Monolithic Architecture to Serverless

In order to get a better sense of the challenges developers face every day, let’s first talk about the difference between monolithic architecture and serverless, and why debugging has become so tricky.

We call an application monolithic when it is built entirely as a single system with a single codebase. Code can be divided into modules, but they usually depend on each other and are designed to run together. With a monolithic approach, the application puts all its functionality into a single process.

Microservices is another architecture pattern in which application functionality is split into independent, highly maintainable services that usually communicate with each other via a messaging system or event bus. The great benefit of microservice architecture is the ability to change, deploy, and scale particular services without affecting the rest of the system.

Though quite similar to microservices, with serverless, the application is broken into even smaller pieces, which are often actually single functions. The important thing about serverless is that the infrastructure is fully managed by the cloud provider, allowing developers to focus only on the code and application logic.

The leading serverless provider today, and the best known, is AWS, whose Lambda service was a real breakthrough in cloud computing when it appeared in late 2014. In this model, a single function can be executed in response to a particular event, and only its execution time is billable.

Traditional Tracing

Tracing is the common developer practice of profiling and analyzing application code to identify the source of a particular error, find ways to improve performance, or simply understand application data flow. If something goes wrong in a monolithic application, it is usually easy to pinpoint the cause: Given that the whole application is executed as a single process, the developer needs only to follow the stack trace to figure out what happened. It is easy to check the logs, as in a traditional monolithic application, they’re also likely to be centralized.

But what if the system is built on top of a microservice architecture and has a few dozen parts that are actually small separate applications themselves? In this case, traditional tracing will not work, as it is almost impossible to trace the whole chain from the input data to the failure.

Now imagine the same situation but with a serverless app, potentially consisting of hundreds of Lambda functions and connected to different cloud services (authentication, database, mail, messaging system, storage, etc.). The particular request can be correctly tracked only with special tools. Fortunately, though, they do exist.

Debugging Serverless Applications

Let’s assume there’s an application that allows users to log in, view and upload images, and receive email notifications when someone else uploads an image, and which automatically generates thumbnails for each newly created image. This may sound like a very simple app, but it would actually consist of at least four Lambda functions: authentication, accessing the image list (HTTP triggered), thumbnail generation (triggered by S3), and notifications (also triggered by S3 or a separate SNS message).

Imagine that after releasing a new version, you notice that users don’t always receive notifications, even though a new image is definitely appearing in the system. It would be ideal to be able to solve this by detecting the point in the whole flow where the issue occurs. Distributed tracing is a powerful instrument that helps to analyze the whole picture of application request handling.

Distributed Tracing

This tracing is called “distributed” because of the distributed nature of microservice and serverless applications. While tracing the monolith app means following the chain of modules and methods called on one host, distributed tracing means following the chain of calls between the event trigger, Lambda functions, and cloud services.

Thundra’s distributed tracing functionality makes it possible to build detailed visual trace maps, track message exchanges between Lambda functions and cloud services, and analyze overall app performance. In the example above, with distributed tracing it is easy to detect which part of the image app doesn’t work properly: The developer will see any failing invocations in the functions list and visual map.

To take another example, let’s say the developer notices that the Lambda responsible for thumbnails crashes from time to time and as a result, it fails to create preview files on S3, leading to missed email notifications. Distributed tracing allows the developer to do a search and pinpoint the issue, narrowing it down to a single function. That’s great, but what about debugging the function itself?

Local Tracing

Local tracing means tracing the code inside the function using traditional tracing methods, described earlier, but within the scope of a particular Lambda function. Analyzing execution line by line, from method to method, is a crucial part of the bug-fixing process. Fortunately, Thundra supports this feature too, so as soon as it is clear which Lambda is causing the failure, you can immediately start investigating the method calls inside the function and analyzing the execution line by line.

Life Without Observability Tools

Developing and maintaining complex serverless applications without tracing tools is complicated. First of all, functions linked to cloud services cannot be properly tested locally, as most services cannot be emulated locally. In other words, it is much better to test the app in a real serverless environment provided by AWS—though this can be a real headache for developers since by default there is no common, convenient way to track application work.

As a developer, you will need to dive into the AWS console, delve into the AWS CloudWatch log mess, try to find particular invocations with known or unknown errors, and take other slow and ineffective actions.

But there is an alternative: You can quickly set up Thundra monitoring, do it only once per project, and gain full observability for your application in a matter of minutes.

Thundra Monitoring

Enabling Thundra monitoring in a project is simple and straightforward if you follow the Quick Start Guide. Create a Thundra account, then click the “Connect” button to link to your AWS account. To complete setup, Thundra will create a new AWS CloudFormation stack in your AWS environment.

The stack has very limited permissions, so your account is completely secure. The last step is to instrument your function to obtain an even deeper level of observability. This can be done automatically or manually, depending on your requirements.

Full Serverless Observability

Thundra is the first and only monitoring service providing full serverless observability for projects built with AWS. It combines local tracing capabilities with distributed tracing integrated with AWS Lambda, API Gateway, S3, DynamoDB, Kinesis, Firehose, SNS, and SQS. The Thundra console provides full visibility into your system, making it easy to detect unhealthy or overloaded parts of an application. If you’d like to check its console and dashboard capabilities first without registering, Thundra also provides a demo environment.

Business Flows

The term “distributed tracing” usually refers to the chain of directly related invocations, as in the example above: Users upload images to S3, S3 then triggers the AWS Lambda function to create thumbnails, thumbnails are saved back to S3, and another function (responsible for emails) is triggered via SNS.

But what if there is another scenario for this app? What if an administrator can start the process of deleting inappropriate images, for example? This will execute a chain of other functions. Both scenarios can involve the same images and the same user. However, they are not related directly, and there is no physical interaction between them. In fact, they are connected only logically.

Thundra’s powerful Business Flows feature allows developers to link such logically related scenarios and see the whole picture while monitoring and analyzing apps.

Observability Made Easy

The exponential growth of serverless development in the last few years has made it a “hot technology” and a very popular architecture for modern applications. Nevertheless, serverless is still software development, and so it requires mechanisms for effective testing, debugging, and monitoring.

Like traditional tracing for monolithic apps, distributed tracing is an essential part of effective development and maintenance of serverless apps. It allows developers to quickly navigate in application flows, fix bugs, and detect bottlenecks and unhealthy parts of the system.

Thundra is a veritable Swiss Army knife, combining distributed tracing and local tracing with monitoring, alerting, and visualizing capabilities for serverless projects. 

Enabling full observability for serverless apps is now easier than ever. Get started today with Thundra.