5 minutes read

POSTED Apr, 2020 dot IN Distributed Tracing

Optimizing AWS Lambda Performance with Thundra

Suna Tarıyan

Written by Suna Tarıyan


Product Manager @Thundra

 X


AWS Lambda is revolutionizing the way software is developed and deployed. Lambda enables companies to think in terms of focused business operations and to ship these operations with low friction. But this ease of deployment can lead to “function sprawl,” in which teams launch many functions but lack a uniform way to monitor and observe them.

Historically, performance tuning in this environment has been difficult. Thundra was created to provide teams with end-to-end Lambda application visibility, enabling them to stay on top of application performance and quickly debug any issues. Here we will show you how to use Thundra to identify and address performance issues, starting from the architecture level and drilling all the way down to individual transactions.

Methodology

The cornerstone of performance engineering is identifying and eliminating unnecessary work (commonly latency and errors). Thundra helps you optimize AWS Lambda performance by providing visibility at a number of levels:

  • System: What are the major components? How are they connected? What is a given Lambda’s dependencies? Where does it sit within a system?
  • Aggregate: What does “normal” look like? What is the current baseline? How many operations are there? How long are they taking and are they successful?
  • Outliers: What attributes do performance outliers have in common? Are they being affected by slow starts? Are outliers the result of errors?
  • Transaction: Where does a single Lambda invocation spend its time? Which operation is taking the longest time for any given Lambda invocation?

Each level provides increasingly detailed information about a Lambda’s performance. Now we will take a look at each level, showing you how to use Thundra to optimize Lambda performance.

Context Is Key: System View

The first step in performance optimization is developing an understanding of the current system. This is often referred to as “system mapping,” and Thundra does this automatically using its auto-instrumentation and Architecture Page. This mapping is essential to building context on why an operation is occurring and where it sits within an architecture.

  • Please be aware that it is extremely risky to skip this step and not to develop a good context. Trying to change a system you don’t fully understand can result in misleading optimization, wasted effort, suboptimal results, or even—in the worst case—performance regression. This context is traditionally maintained through documentation, tribal knowledge, or hands-on code inspection.

Before optimizing a Lambda, you need to ask what its resources and dependencies are and what inputs and outputs it has.

Thundra can auto-instrument Lambdas and detect AWS Lambda resources and application-level resources. Figure 1, from Thundra’s Architecture Page, shows an example map of all Lambda resources and dependencies:

image8Figure 1: Example of Thundra’s Lambda architecture diagram

In Figure 2, we use a simple example to illustrate the performance-optimization methodology:

image3

Figure 2: Test Lambda architecture diagram showing upstream and downstream dependencies

Without knowing anything about the test Lambda, we can see that it makes GET requests to Google, writes to an AWS arn resource, and invokes another Lambda (thundra-test-2). In many organizations, the only way to figure this out is to ask senior people or grok code! The figure also shows that thundra-test is driven by CloudWatch Events Schedule Expressions. Thundra eliminates the issue of architectural documentation drift and the need to invest time in maintaining flowchart diagrams. Architecture diagrams are generated dynamically, in real time, and include infrastructure and dataflows.

Aggregates: Defining Normal

The next step in performance optimization is developing an understanding of the baseline, or current level of performance. This involves looking at “coarse-grained” aggregate data. Aggregates help to characterize the base performance of a service across all clients. They are not a per-customer view. They work by modeling rates over a couple of “coarse-grained” dimensions, such as environment, service, and operation. It’s important to develop a baseline in order to judge the efficacy of any optimization efforts, but this must be done in a production-like environment.

Baselines require learning:

  • How much work the Lambda is doing (i.e., how many requests per second).
  • How long it takes.
  • What the result of the work is (i.e., success or error).

These are three of what Google refers to as the four golden signals. Figure 3 shows that just by clicking on a Lambda, you can see Thundra’s summaries and counts alongside the architecture view:

image4

Figure 3: Thundra architecture diagram with Lambda invocation counts and durations

This image shows the aggregates for any resource or connection on the architecture graph. You can compare the effects of optimizations to this aggregate: Have efforts resulted in faster processing? Have errors decreased? Have latency percentiles decreased? Aggregates are needed to define the “shape” of the performance over time.

Understanding Outliers

The next step is to dig into outliers to identify operations that are good candidates for optimization. Aggregates represent distributions in latencies, which are commonly average, 50th percentile, 95th percentile, 99th percentile, and maximum. Outliers dive into the tails of those distributions: the 1% or 0.1% or 0.01%. Outliers are a powerful heuristic for performance optimization that focuses on reducing tail latency

According to LinkedIn engineering, “A 99th percentile latency of 30ms means that every 1 in 100 requests experience 30ms of delay. For a high traffic website like LinkedIn, this could mean that for a page with 1 million page views per day, 10,000 of those page views experience (noticeable) delay.”

Thundra provides tail latency through the Lambda Performance Analysis Tab, shown in Figure 4:

image5

Figure 4: Performance Analysis Tab

By recording each individual invocation event (rather than just aggregating), Thundra makes digging into outliers easy. All you need to do is select the outlier range. Important information— like the cold start time, execution latency, and errors—is also available as a time series, as shown in the bottom right hand of Figure 5. 

image1

Figure 5: Performance Analysis Tab with outliers selected

A common performance optimization heuristic is identifying what, if anything, the slowest operations have in common. Thundra’s Performance Analysis Tab provides that information so you can easily query it.

Invocations: Digging Into Individual Transactions

The final step in performance optimization is digging into an individual operation to identify candidates for change. This is commonly done by “profiling,” which lets you see a breakdown of where an application is spending its time. Another common heuristic in performance tuning is to find the longest operation and reduce the time it spends executing, which shifts the performance profile of the application. You then repeat this process until performance is acceptable.

The next evolution of profiling is distributed tracing, the dynamic real-time profiling of applications at the logical request / transaction level. Distributed tracing tracks individual events (called spans) and the events that cause those spans. This creates a DAG showing an overarching transaction and then breaks down important parts of that transaction. 

Traces are usually very focused, showing how an individual transaction breaks down. Thundra invocations show distributed traces associated with individual outliers. Invocations are available on the Performance Analysis Tab and in a number of other locations, as shown in Figure 6:

image2

Figure 6: Performance Analysis Tab showing individual invocations

To see a breakdown of where an individual transaction is spending its time, you click on an invocation containing a trace, as Figure 7 presents:

image9

Figure 7: Invocation detail trace view

In Figure 7, the total Lambda invocation took 11.3 seconds, and there were a number of different operations within it. The total Google request took 451ms/11330ms = 4% of the total transaction. In this case, prepare_request took 10010ms/11330ms = 89% of the operation!

A common heuristic in performance tuning is to find the longest operation and reduce its execution time. If prepare_operation is regularly the largest contributor to Lambda latency, it is a good candidate for tuning. Thundra offers client libraries and APMs for the most popular programming languages. The examples in this post were generated using the Python library. Thundra’s rich auto-instrumentation enables you to generate these metrics in a couple of minutes.

Thundra Provides the Tools, You Do the Tuning

To optimize performance, you need hard data so you can form hypotheses, identify where tuning should take place, and then measure the effect of that tuning. If you tune into the larger architecture without context, this can result in misguided efforts, wasted time, and a failure to achieve measurable impact.

Thundra provides you with the tools to identify and progressively home in on Lambda performance issues: 

  • Architecture provides you with the high-level, bird’s-eye view, as well as the aggregates to help trend and identify “normal.” 
  • Root Cause enables you to drill down into outliers.
  • Invocations (with tracing enabled) provide you with a view of legs of transactions.

Happy Tuning!