[eBook Presented by AWS & Thundra] Mastering Observability on the Cloud 📖 Download:

Caching with AWS Serverless Applications

Jun 2, 2020



Improve Serverless Performance with Caching

What's a phrase that encompasses all the metrics and measurements that make or break your site or app? Put succinctly, it’s "customer experience." Sites that focus on delivering a great customer experience will thrive, while those that don't will fade into irrelevance. 

How do you deliver a great experience? An excellent place to start is ensuring that your web pages are fast, smooth, and responsive. In the past, you could do this by adding more resources to your web servers or purchasing more bandwidth. Modern sites, however, might not be using servers at all, since serverless web solutions have gained immense popularity in the past couple of years. 

The simplicity of serverless has freed developers from the tediousness and pain of maintaining traditional hosting infrastructure. You can deliver simple, cost-effective, scalable solutions without administering a fleet of servers. That doesn't mean, though, that serverless is immune to performance concerns. Far from it.

Serverless applications require careful design and monitoring to ensure a smooth and responsive customer experience. Let's take a look at caching, an important concept you can implement right now to improve the customer experience for your serverless app.

What Is Caching and How Does It Work?

Since we’ll focus here primarily on an AWS-based solution, let’s take a look at the AWS definition of caching

"In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data." 

Note our emphasis: "served up faster." That's the key point of caching when you're discussing web applications. Faster means shorter page load times, faster query responses, faster content delivery—the list goes on. And load times can directly impact your bottom line

We've established that a cache is basically high-speed data storage, designed to enable quick access to data, particularly data that has been requested previously. What about the different kinds of caching? 

Caching implementations usually fall into two categories: in-memory and application. In-memory caching typically utilizes commonly known software engines like Redis or Memcached, installed on a server configured with a large pool of very fast RAM for cache storage. Application caching is much more dependent on the underlying part of the stack where the cache implementation is taking place.

AWS Lambda functions offer their own in-memory cache. DynamoDB has its own caching feature, called DAX. A content delivery network (CDN) like CloudFront can cache media and content in close geographical proximity to users, offering quick delivery of content that might otherwise have long loading times. 

With all this in mind, we'll look next at some quantifiable data on the kind of impact caching can have on a serverless architecture.

Serverless Performance Implications

Though serverless architecture promises simplicity and freedom from managing traditional infrastructure, it comes with some downsides.

An article called Caching Techniques to Improve Latency in Serverless Architectures evaluates the impact of caching on a hypothetical serverless application stack. The first thing you'll likely notice is the additional latency overhead that serverless infrastructure can impose. However, by using basic application caching at the Lambda function level, the authors were able to achieve a ~50 percent drop in response time. 

As we noted previously, load/response times can have a direct impact on the bottom line. The benefits of implementing cache should be obvious. Next, we'll lay out a hypothetical serverless application stack we can use to explore the different caching implementations.

Example Serverless Application

Serverless architecture implementation in AWS, or elsewhere, will vary in scope and design based on the use case. Nevertheless, it's fairly trivial to identify some common, reusable design patterns. Brian LeRoux, creator of the Architect serverless framework, identifies a common set of components in this SE Daily episode

  • API Gateway 
  • Lambda 
  • CloudFront 
  • S3 
  • DynamoDB 

That gives us a great starting point. Since this is an AWS-based architecture, we'll add Route 53 to the mix as our DNS layer. Visually, the flow should look like this:


Figure 1: Basic AWS serverless stack

This will provide a good base architecture to work from, and it should be fairly typical of serverless apps with any kind of persistent data or RESTful operations. Now we can look at the different ways caching can speed up your serverless app!

Serverless Caching

The first question we need to answer is this: Where do we implement caching? Caching has potential applications at every layer of the stack.

The Client

That's right. The client is technically just another layer in the stack, albeit one that you don't really have much control over. However, that doesn't mean you can't use the client to improve app performance and potentially save on bandwidth costs. 

Using certain header directives, you can instruct the client browser to cache the response. If the client can fetch content or response data locally, you’ll see an immediate and obvious improvement in performance, and as a bonus, the app won’t consume additional network bandwidth until a request for new data is made.


Passing requests between your Lambda functions and a backend database can add non-trivial latency to your app’s overall response time. A cache, or a database-specific accelerator like Amazon's DAX, can measurably improve response times.


Figure 2: Serverless stack with DAX

If your database backend isn't DynamoDB, then you’ll need to implement ElastiCache or manage your own caching solution on EC2.


Figure 3: Serverless stack with ElastiCache

Managing your own solution does mean more administrative overhead, but it gives you more granular control over configuration and instance sizing. If you want high performance, in-memory caching, ideally, you should choose a memory-optimized EC2 instance and use Redis to handle the caching.


Figure 4: Serverless stack with EC2 caching

Lambda In-App Caching

Lambda functions provide their own caching mechanism, described here in the AWS documentation. You can store static assets in the /tmp directory and reuse programming objects like database connections as long as you create the logic outside the handler function. Consider these two Python examples:

import boto3
def func_handler1(event, context):
	"""all logic inside the handler"""
	client = boto3.client('dynamodb')
	response = client.get_item(
			'id': event['id'],
			'name': event['name']
	return response['Item']

Figure 5: Unoptimized Python code

In the first example, all of the logic occurs within the handler function. This means that for each invocation of the function, the database connection will have to be re-instantiated, and any data or assets that are fetched will have to be retrieved from the database again. 

import boto3
#database client instantiated outside the function
client = boto3.client('dynamodb')
#naive cache
items = []
def update_cache(item):
	items[0] = item
def func_handler2(event, context):
	if len(items) > 0:
		response = items[0]
		response = client.get_item(
				'id': event['id'],
				'name': event['name']
	return response

Figure 6: Optimized Python code

In the second example, the database client is instantiated outside the function. We also create a very naive cache implementation using Python's list data structure. If the requested data is present in the list, it is returned as response data, without invoking another database call. While this represents a very bare-bones example, more carefully designed logic could cut down the number of database calls the app would need to make.

There are some caveats to this type of caching. Invocations of Lambda functions occur in an "execution context," which is an abstraction that represents a virtualized "container." There is no guarantee that this environment will be consistent from one invocation to the next.

The biggest benefit comes from rapid, repeat invocations of the same function. If your function is only invoked every few minutes, it's unlikely you would ever be able to utilize a cached context.


CloudFront is a CDN provided by Amazon. Caching static content like image files and CSS at the CloudFront layer will greatly reduce bandwidth consumed and response time. Those who visit your site or use your app won't be waiting for round-trip API calls or Lambda invocations in order to be presented with a functional user experience. Viewing of static content will be fast and efficient.


Figure 7: Serverless stack with CloudFront caching of static assets

Route 53

Route 53 is Amazon's globally available DNS service. With DNS, "caching" takes on a somewhat different meaning. As in client-side caching, you don't really control the caching so much as pass directives, instructing the client how to cache. In this case, setting sensible TTL values for your DNS records will allow clients to rely on local DNS resolvers. Low TTLs mean they have to perform a full DNS lookup more often, potentially adding precious milliseconds to your app’s response time.


With a caching solution in place, the next step is to monitor your app’s performance. The metric that most accurately reflects end user experience is response time. If you want something dead simple and relatively available, this blog post provides a great example of using the cURL utility for a basic response time breakdown. However, if you discover an issue, this method provides no insight into which component of the stack is at fault. 

If you want a more in-depth look at performance, a tracing application like AWS X-Ray or Thundra will provide granular performance data for each stack layer. Tracing will break down response time by components, highlight whether it might be an inefficient database query, or catch excessive cache misses. This type of visibility will save precious engineer hours chasing performance rabbit holes. 

Unlocking the Potential of Serverless Technology

Serverless technology has opened up entirely new ways of thinking about web application architecture, replacing bulky, hard-to-manage infrastructure with nimble, efficient microservices, backed by services like AWS Lambda. 

But like any new technology, it comes with concerns and downsides. You’ll have to carefully manage serverless performance to maintain a seamless customer experience and ultimately unlock its full potential. Implementing caching in your serverless app will reduce response times, save on costs, and ultimately provide a better experience for your customers.