The world is fighting with COVID-19 relentlessly and these days showed me that pandemics will continue causing problems even after we beat this one. The lesson learned from those bad days is that software plays and will play a crucial role in combating the next insane problem that human beings will face. Serverless paradigm is our hope to deliver modern, fast, reliable and robust applications for many reasons such as ultimate scalability, easiness to build something from scratch.
Serverless is not the ultimate paradise, though. At least not yet. There are still some problems that application teams are facing. First and foremost, developers have lost access to underlying infrastructure that changed the habits of debugging and troubleshooting applications as we could do before serverless. Second, our apps now make many uncontrollable third-party communications that can cause latencies. The easy cure is to avoid sync communication as much as possible but it may not be feasible always. Finally, it’s harder now to replicate the issues happening in production at our local so harder to understand the behavior of applications under the hood. These problems all can be summed up under one roof: Understanding application behavior and sustaining the application health. We believe there are three pillars of understanding application health: Observability, security/compliance and debugging/testing. Thundra is known for its observability features but we recently started investing in security and debugging capabilities. In this blog post, I’ll try to cover how Thundra is helping many organizations with that and willing to help to many others for free. Just to keep it short, I’ll only talk about observability and debugging and how they play together in Thundra.
How does observability and debugging play together in Thundra?
Serverless made computing a black-box that we are not supposed to care how it works. If we really care about it, the only resource that we are doomed to search for the needle in the haystack is logging. However, logs are never enough, and tracing is the key to understand how distributed systems behave. Yes, the built-in tracing solutions are improving themselves and there are great efforts leading the community-driven efforts such as OpenTelemetry but developers are still searching for a plug-and-play solution to understand the application behavior with tracing. This is what we are trying to achieve at Thundra. We believe the idea of serverless: “Developers should focus on the value of delivering first-class software, not anything else”. We believe firmly that the observability and debugging of serverless applications can be no exception. Developers need to achieve a sufficient amount of aggregated and correlated data without polluting their code. That’s why, Thundra provides an automated way of instrumenting applications with no code change just adding Thundra libraries to your application using AWS SAM, Serverless or even from Thundra console with a few clicks. Using Thundra’s automated instrumentation, application teams can discover anomalies on their architectural diagram, catch the bottlenecks with detailed performance analysis and fix the issues using automated tracing that covers the in-app communication down to the line level of the code.
Debugging serverless applications is another headache for serverless practitioners. In order to debug the serverless applications better, the only resources are the log files that provide a limited debugging experience. It’s still possible to debug serverless applications to some extent if you’re prepared enough for logging the required information. Most of the time, developers forget to put a sufficient amount of log statements and stay clueless about understanding the application behavior. Thundra provides two different solutions to the serverless debugging problem:
- Offline Debugging: Sometimes your code doesn’t behave in the way that you expect and you notice it late. In such cases, a tool that lets you step through the function execution line by line and see how the variables have changed in the invocation that finishes minutes or even days before.
- Online Debugging: This brand new addition to Thundra enables you to pause the invocation of a function while it natively runs on the cloud. In this way, you can debug the Lambda invocation as you could do with a local code in your computer. Thundra achieves this by setting up a secure bridge between the AWS cloud environment and some well-known IDEs such as VSCode and IntelliJ IDEA. Note that we are not causing throttling or any blocking of the other instances of the Lambda function but we only debug one single instance with the data that we bring for debugging. This means that you can debug your production Lambdas safely with Thundra’s new online debugging.
Thundra combines the observability and debugging to equip the application teams with the best possible ways of understanding application behavior. When you can understand how a function behaves with offline and online debugging, the focus will shift to how the system behaves altogether. For this purpose, you can always jump from the debugging context to the observability context. Thundra aggregates the debugging data with distributed tracing so natively that you could jump to messages exchanged between services right after you track the value of a local variable in a particular function. You can then see the aggregated system health as an architectural diagram and see the problematic parts in your application at a glance. We’ll understand how it works on a real example.
A real-life Example
In order to see how Thundra makes it extremely easy to understand the issues, we’ll walk through a real-life example. We’ll use a blog site application that allows authors to submit their blog posts to the application. These blog posts are then published if they’re approved by editors. We’ll walk through the posting a blog post. Here are the asynchronous steps taken by the application.
- The blog application should immediately display a message explaining that the post is saved and will be reviewed by editors.
- The new blog post is ingested into an SQS queue.
- A consumer Lambda consumes the submitted blog post from SQS and publishes a message to SNS to notify the editors.
- The same Lambda saves the blog post so that it can be reviewed by editors.
- From the DynamoDB record, another Lambda is triggered and writes the content to an Elasticsearch table with necessary indexes so that the blog post can be searchable among millions of other posts.
When we are saving a blog post, we figured out that this blog post couldn’t be saved. In order to understand the issue, let’s check the blog post flow and try to understand where was the problem.
As we can understand the above image that the flow couldn’t perform all steps because it’s interrupted in the blogPostProcessor function. We understood from the image it made three retries and couldn’t manage to continue to the flow. Let’s have a closer look at what happened in this function with offline debugging. As seen in the below image, the blog post was exceeding the character limit for a blog post (that was kept very low for demo purposes).
After that point, we can increase the character limit to a more meaningful number and try online debugging for blogPostProcessor function to see if everything works fine and we can complete the event-driven blog posting flow. You can check this blog post on how you can debug a serverless Node.js application online using VSCode.
After we’re sure that we fixed the problem, we can disable the online debugging and let our robust system continue working. As you can see, we didn’t pollute the code, we didn’t need to educate anyone about using Thundra but very intuitively troubleshot a problem that disables our customers to save a blog post.
Conclusion
Serverless technologies and managed services are the future of software development and delivery. Existing solutions that are based on the access to the underlying infrastructure cannot solve the problems of these distributed event-driven architectures. We need more lightweight solutions that can generate distributed tracing, offline debugging data automatically and let you run the online debugging from your IDE. We can proudly say that Thundra is the only tool in the market that can do all. From that perspective, we are happy to call Thundra a tool that causes a tool relief. In order to strengthen this position, we recently brought security features that let you define guardrails for your applications. In this way, you can set up whitelisting/blacklisting policies to control the outbound traffic of your applications. We’ll sharpen this knife by letting developers control the inbound traffic as well and recommending them IAM policies with least privileged access. If you want to try Thundra, you can start your journey here.
One last note: We are passing through extraordinary times. We want to help the companies that need us in any case, if you’re a developer/DevOps/sysadmin working in a health institution or any institution that combats with COVID-19 and you need any help from us about serverless, just ping us. I should also remind you that Thundra is free for you at your service at any scale.
If you’re healthy with your beloved ones, be careful, try to stay at home with your family. We wish you safe and happy serverless-ing!