Serverless Observability Fundamentals: Breaking down your options when collecting data from AWS Lambda

Jan 22, 2019


Many AWS Lambda users we talk to aren’t sure what data they *can* collect or *should* collect from their serverless environments.  Many of them are still digging through CloudWatch logs. Others have built their own alerting systems that they update and modify as their needs grow.  But, most, aren’t doing anything with regards to monitoring.

In previous blogs, we’ve discussed, in-depth, why you should collect data to observe the behavior of your environment.  Today, let’s assume you know you need to collect your data.  So, let's go over the different depths of AWS Lambda data collection and how you extract that information from your environment.


Categories of data you can collect from AWS Lambda environments

Simply put, we break down data collection into four categories:

1. Basic monitoring data - By default, AWS Lambda generates basic information in the form of CloudWatch logs. These are text files that include notification about function errors and basic usage data. You should definitely take advantage of the information included in your CloudWatch logs to speed up debugging, identifying issues, and solving problems.  

However, we highly recommend that you use a visualization platform with some amount of built-in intelligence to crawl through and extract information in the logs for the organization and easy analysis. Otherwise, you’ll be stuck in the time-consuming task of sifting through the logs yourself if you want to understand what’s truly going on in your environment.  Although some people build their own tools for analyzing their CloudWatch logs, don’t bother wasting your time. Use a free service like Thundra’s or some other vendor's to do it for you.

basic monitoring data

2. Basic tracing data - This is data collected by default if you are using AWS X-Ray, which is an add-on service for analyzing end-to-end requests in distributed applications.  The high level AWS X-Ray tracing data tells you how long it takes to execute a function (round trip time) from when the request is kicked off to when it hits. It includes response times from services such as DynamoDB, SQS, and more.

basic tracing data

3. Detailed tracing data - This data includes additional trace detail into what’s going on inside your functions.  For example, detailed tracing data shows you the actual database calls your functions are making. This lets you easily see inefficiencies that may be caused by your code, or perhaps by a specific database request.



4. Detailed argument data - This data can include information like how long it takes to execute function arguments, any errors that may arise, and return values.  When combined with detailed tracing data, you can get a full picture of what’s going on in your AWS Lambda environment.


How to collect your monitoring data

How do you extract this data for viewing in your visualization platform of choice?  Here are your various options:

Do nothing.  Find a platform to collect and organize what you get in AWS Lambda by default. With some vendors, you don’t need to do anything to your AWS Lambda environment.  These simply take what you’re already getting from AWS and visualize it in a useful manner.  This is a great approach if the generic logs are good enough. But, often, you’ll want more data for your specific use case.

Manually add log print outs from your functions. With this approach, you are essentially adding lots of printouts everywhere you think you need to understand what’s going on in your code. It’s up to you to figure out what you want printed out, how often, and where.  This means you believe you already know what problems your code is experiencing and what you need to keep track of. But, what about identifying unknown, unpredictable issues in your application?

Instrument your functions by installing an agent (on your own or via Lambda Layers) to wrap, update a handler, or add annotations to your code. For deeper (and more useful) data collection, you will want to instrument your code to get as much useful information as possible with the smallest amount of effort.  Many vendors offer an automated way to instrument your functions, with varying levels of data collection capabilities. However, if you would like even more precise data collection, you’ll probably need to add some manual instrumentation to your code as well.

Why use an automated versus manual approach for instrumenting data?

Automated instrumentation approaches are easy.  There’s no digging into your code to extract the information you want. That’s great if you get what you need from the automated approach.

A manual approach is good for covering areas of your code that, by default, logging and automated instrumentation doesn’t cover. For example, there may be a particular relational database not covered by automated instrumentation that you want to closely watch.  Or other particular use cases specific to your application goals not covered by automated instrumentation.

When deciding to use an automated approach, manual approach, or some combination of both, consider a couple things:

  • Vendor capabilities.  Every vendor takes a different approach to observability and monitoring.  For example, Thundra was born out of a need to observe Java applications, which means that Thundra provided automated instrumentation for Java functions out-of-the-gate.  (We now also support Node.js and Python automated instrumentation as well!). However, if you’re writing applications in Go, Thundra doesn’t currently provide automated instrumentation support for that language...yet.

Also, each vendor will collect different types of data and at different levels of detail.  Perhaps you get *most* of the information you want from automated instrumentation but not *all* the information you want.  Which comes to the second thing you need to think about when considering your instrumentation choices….

  • Your specific observability goals.   You may get everything you need using whatever automated capabilities a vendor provides out-of-the-box.  But, commonly, you will have specific observability goals and specific areas of the application which you may want to observe more closely.  For example, perhaps your top priority right now is debugging your application as you develop. Or maybe it’s cost management. Perhaps you’re already in production and you just need to optimization or generally keep an eye on things.  Maybe there is one specific part of your application which you need to watch at a much deeper level than others. For this, you may need to go beyond whatever is provided in automated instrumentation and apply a manual approach.

In general, I always recommend starting off with your observability and monitoring goals (what are you trying to accomplish) and then use those to determine what you need to monitor.  Taking an “observability-driven design” approach is useful here. Grab everything you can using an automated approach to wrapping and instrumentation.  Then, go in and selectively add manual data collection where you need it. Remember, the more manual work you do, the more you need to keep track of, maintain, debug (it sucks to debug the thing you are using to debug your apps!), and adjust if you make changes to your main app!

Anyway, that concludes our overview of the types of data you can collect from your AWS Lambda environment and your options for how you collect that data.  

Ready to start peering into your serverless environment? Go get your free account and use Thundra to automatically wrap and instrument your functions.  You’ll be analyzing your Cloudwatch logs in minutes!