From the very first day, Thundra aims to help software teams troubleshoot serverless applications seamlessly. I should say that I’m getting fascinated with the different complex and event-driven serverless architectures that our customers have. As they have different serverless approaches to delivering software with serverless, they also have different ways of troubleshooting.
Although Thundra already provides both distributed and local tracing, metrics, and logs for single invocations, many people still dig the logs out to discover the problems. Today, we’re happy to announce our Log Search feature which will enable software teams to search in the logs of all functions for specific patterns.
Thanks to this feature, you can now search for the log messages which contain exact words or expressions with wild cards. For example; you can type
*undr* and this filter can bring you the logs which contain
hundreds. You can also filter the logs according to their log level and log context.
However, procuring the desired log from the stack of logs is only half the battle. You still need to discover more about the issue. For this reason, we provide a gate to the existing Thundra features from this point. After your search returns a log item, you can navigate to the detail of the invocation in which the log item is printed. Once reaching the relevant code segment responsible for producing the log, you check for metrics such as duration and health along with any interaction with other functions. .
Let’s look at a scenario showing how you can take advantage of this feature.
Suppose you have an application that retrieves the books of a user and builds a content-based profile to make book recommendations to this user. On a sunny afternoon, one of your customers creates a support ticket saying that they cannot see their recommendations. If you are printing the identifier of users when you first retrieve the user from API, it is very easy to discover what happened with Thundra. (Note that printing such kind of debug logs into production is not a best practice and can create an unpleasant cost item due to CloudWatch pricing.)
The log that you print after you retrieve the users is in the format of “User with id %d has been retrieved.” If it fails to get the user from the database, the printed log is in the format “User with id % could not be found for retrieving” To understand the problem:
- Filter the logs with log message that contains `User with id*`
- See the log item whose log level is “Warn”, in the following screenshot.
We can see from the log statement that we failed to retrieve the user from the database. But why? We can explore the problem further by clicking on the ‘Go To Invocation’ Actions button of the log to navigate to the invocation and see what actually happened.
As you can see from the following screenshot, the function has failed to get the user from the cache.
We discovered that the problem was related to Redis not responding for a reason. You should check if this problem affected any other customers and if the problem still persists.
Today, we’re very proud to announce this greatly useful feature that will ease troubleshooting serverless applications with more convenient means. We’ll continue to support our customers and the serverless community in the following days. In these days, we are working on supporting alerts for serverless applications. You’ll be able to save the searches you make on the logs as alerts and create your bespoke alert system in this way. Please let us know if you need to see other ways of exploring the serverless architecture.
You can sign up to our web console and start experimenting. Don’t forget to explore our demo environment - no sign up needed. We are very curious about your comments and feedback. Join our Slack channel or send a tweet or contact us from our website!