From the beginning, Thundra team works hard to help software teams troubleshoot serverless applications. You are already able to search your lambda functions and their invocations thoroughly with queries and also save & share your these queries within your team or save them privately. Many of our customers are using these query capabilities and we value customer feedback a lot, therefore we worked hard to bring the same query capabilities to traces. As we promised while announcing Full Tracing feature, today we are happy to announce that you can deep dive into your traces with advanced query capabilities of Thundra.
Let’s recall what we call trace once again. Trace is a chain of serverless function invocations which make up a transaction to achieve a real-life requirement. For example; A Lambda function gets triggered by a request coming to API-Gateway. After doing a transformation job on the request, It then notifies another function(s) through SNS to process the message. These Lambda functions can run different business logic with the message and write the result to a DynamoDB table and makes a call to another third party API. Monitoring only the function which is triggered by SNS doesn’t give the full context of the transaction itself. You need to see what actually happened through async events in the transaction. In order to satisfy this requirement, Thundra provides automated distributed tracing for Node.js, Python, Java, Golang, and .NET along with local tracing which helps customers see what happened in the function. When you plug Thundra to your functions, you will start to see the serverless transactions in the “Traces” page
As you may be familiar with our query capabilities on your function and invocation pages, trace queries are similar to them. Previously, we were listing the traces from the most recent one to the oldest but our filtering capabilities were limited when we first announced the Full tracing. You will be able to conduct really deep investigations with our new tracing capability.
For example, in the above query, we filtered the traces in which at least one of your serverless transactions has interaction with DynamoDB and its interaction duration is greater than 1000 ms. After you find the related traces, by clicking on rows, you can view the trace map and get more details about DynamoDB call. As you can see DynamoDB interaction of ‘team-get-lambda-java-lab’ function took 3020 ms.
If you have multiple DynamoDB tables and you can also specify the table name in the query to filter our traces. Below query brings the traces which had an operation on “team-lambda-java-layer” DynamoDB table and its duration is higher than 1000 ms.
Our agents have integrations with some popular libraries and many of AWS services including Lambda, SNS, SQS, Kinesis etc. With trace query capabilities, you can search all interactions with these resources with flexible conditions. Our trace search capabilities are not limited to resources used by your serverless applications. You can also search for the following aspects of your traces:
- Traces that have specific application names
- Traces that have errors
- Traces that have invocations with cold start
- Traces that have specific error types
- Traces that are triggered by an external service (SNS, SQS, ApiGateway, etc..)
- Traces that belong to a specific project
- Traces that have specific invocation or function tags.
There are more of them that you can use while searching your traces. Please refer to our documentation for more details.
Resolving customer complaints with Trace queries
Let’s see a real-life scenario to see what we can achieve with this new capability. Suppose you have a customer with a ticket open saying that they are experiencing slow responses from your system. If you’re sending the user IDs to Thundra with our custom tag support. You can filter out this customer’s traces and the service that they’re complaining about with the example query below.
Name=user-get-lambda-java-lab AND tags.user.id=”7” ORDER BY StartTime Desc
At first glance, you also see that she is right about complaining about the duration which is a lot higher than normal. Then, you remember that it was because of PostgreSQL last time it happens. You can check if it’s the same again with the following query.
Name=user-get-lambda-java-lab AND tags.user.id="7" AND resource.POSTGRESQL.duration> 10000 ORDER BY StartTime DESC
Bingo! It’s same again. You can go and check what might go wrong with PostgreSQL. You don’t need to even debug your code. It’s not your code, it’s the misconfiguration or badly given indexes on PostgreSQL. This is the problem you need to solve.
Collaborating with your team over Trace Queries
As you can see, you can dive as deep as you want with our queries. We have another good news about these queries; you can share them with your teammates and make everyone aware about the health of your system. Similar to function and invocation queries, trace queries can be saved as private and public in the organization.
What is Next?
We’re happy to come up with advanced trace search capabilities today. As you may know, we have announced our flexible alerting feature a couple of weeks before. The next milestone about traces search is, you will be able to save your trace queries as Alert Policy to get notified if your query is violated by a trace. We’ll be happy to hear your feedback and utilize queries to explore your traces. Please submit your requests to firstname.lastname@example.org and join our Slack and let’s chat. You can sign up to Thundra or see live demo to see Thundra in action.