AWS Lambda, the most popular and widely used serverless cloud computing service, lets developers run code in the cloud and frees them from dealing with server provisioning or management. Amazon Relational Database Service (RDS) is a more traditional, server-based service that provides you with SQL databases. It will do some heavy lifting, but you’ll still need to do proper server provisioning and management.
Can AWS Lambda and RDS work together seamlessly? Or is it too much of a challenge to connect server-based and serverless architecture in real world use cases? Let's find out.
Using RDS with Lambda Functions
Even though AWS provides us with a fully serverless database, DynamoDB, it's not always the best possible choice. DynamoDB is a NoSQL database, which means that its ability to reflect complex data relationships is far from perfect, even though it can do so.
Also, DynamoDB is not great with complex data queries. If your application's use cases require you to either work with relational data or perform complex data queries, DynamoDB might not be your best choice, and you should consider using an SQL database provided by Amazon RDS.
It’s important to know that when you use Lambda with RDS, two major issues can significantly affect your application’s performance and scalability. Fortunately, recent AWS updates give us new tools to mitigate those issues and use RDS seamlessly.
Running Lambda in Your VPC
RDS instances are usually run (and should be) in a private subnet in a VPC assigned to your account. It’s a basic security principle to keep RDS instances isolated from the outside word, but this makes it inaccessible for Lambda functions.
Figure 1: RDS instance in a private subnet
Lambda functions, on the other hand, run in an isolated, AWS-managed VPC that can be accessed only by the AWS Lambda service. Functions have access to the internet and can call AWS services or any third-party resources that are publicly available. They are not, however, able to access private resources, like RDS instances running in private subnets.
Figure 2: Lambda execution environment in AWS-managed VPC
In order for the two to work together, you’ll need to establish a proper inter-VPC connection by specifying a VPC and a subnet configuration when Lambda is created.
In the past, this setup enabled Lambda to connect to private RDS instances but caused a lot of performance issues. To enable Lambda to connect to your VPC, an elastic network interface (ENI) had to be created with every new Lambda execution environment. The time needed to create and attach the interface significantly increased the function’s cold start duration.
Figure 3: Lambda connecting with user’s VPC via ENI
As a result, cold starts for Lambdas connected to users' VPCs took as long as 10-15 seconds, which was far too long for many use cases!
Another problem was caused by Lambda's enormous scalability. AWS Lambda scales through creation of multiple concurrent and independent execution environments that can handle increasing workloads. Since a dedicated ENI had to be created for every individual function environment, this could quickly lead to hitting the account’s limits for assigned ENIs and was using available IP addresses in the VPC.
Figure 4: Multiple Lambda instances overwhelm the user’s VPC with multiple ENI connections
This situation was very dangerous, since it could lead to application stability issues and seriously hinder management of IP addresses in your subnets.
AWS Improvements to Lambda's VPC Networking
On September 3, 2019, the AWS team announced a significant improvement to Lambda's VPC networking.
With this update, Lambda now uses AWS Hyperlane, a networking virtualization platform, to create connections between the Lambda's VPC and the user's VPC.
Figure 5: Multiple Lambda instance connections with user’s VPC via Hyperlane NAT
Instead of creating a new elastic network interface for every Lambda instance, now, a single Hyperlane VPC NAT instance is created when the function is created (or the VPC configuration changes). During the invocation phase, when the function's execution environment is created, there is no need to create another ENI: Lambda will connect directly to Hyperlane NAT, which manages the connection to the user's VPC.
This approach greatly reduces the duration of Lambda's cold starts. Now, the cold start difference between the default Lambda configuration and connecting to the custom VPC is almost negligible, so it’s no longer an issue when you connect to VPC-internal resources like RDS instances.
Figure 6: Lambda cold start timeline, before improvement in VPC networking (Source AWS)
Figure 7: Lambda cold start timeline, using Hyperlane NAT (Source AWS)
It's important to remember that Hyperlane-managed ENIs are still created inside of your VPC and use the subnet's IP addresses. However, typically, Hyperlane creates only one ENI per subnet-security group combination, so the effect on the IP pool is very limited and is not affected by Lambda scaling.
This new Lambda VPC networking process is already being shipped to the majority of AWS regions. It's applied automatically to all existing and newly created Lambdas and does not add any extra costs.
Connecting to RDS from Lambda Functions
Apart from VPC networking issues, there is another potential hurdle when you try to use Lambdas together with RDS.
Classic applications usually connect to the database instance using a limited number of DB connections. Those connections are stored in a pool that is shared between multiple requests. This means that even when the workload is heavy, the number of active DB connections does not exceed some particular limit.
Figure 8: Traditional application with DB connections pool
With Lambdas, however, this is different.
Every Lambda execution environment is isolated, and it doesn't share any data with others. This means that every Lambda function instance has to create its own DB connection to RDS.
Figure 9: Multiple Lambda functions connected to RDS
When under heavy load, Lambda can spawn many thousands of concurrent functions, and all of them will try to establish a connection to the DB instance. What's more, by design, a Lambda’s lifetime is short, ideally just a few seconds or minutes, so those RDS instances will get flooded with thousands of concurrent connections, opening and closing again and again.
Figure 10: Scaling Lambda functions overwhelming RDS instance with many concurrent connections
Classic SQL database engines are not designed to handle such workloads. Opening, closing, and keeping the connection alive is a resource-consuming process, and such situations will quickly lead to server overload and stability issues.
Until recently there was no proper solution to this issue. The only available mitigation was to limit the Lambda concurrency to control the number of concurrent connections and increase available server resources for the RDS instance so it could deal with more open connections.
The more advanced approach would be to create a custom proxy server that would be responsible for managing DB connections.
Amazon RDS Proxy
Fortunately, in December 2019, Amazon announced a new managed service designed specifically to address this issue: Amazon RDS Proxy.
The idea behind RDS Proxy is very simple. A Lambda function, instead of connecting directly to an RDS instance, connects to RDS Proxy, which is a managed resource sitting in between Lambda and RDS. It serves as a DB connection pool and is responsible for efficient creation and management of database connections.
Figure 11: Multiple Lambda function connections to RDS via RDS Proxy
RDS Proxy will handle spikes in Lambda requests and make sure that the DB instance does not get overwhelmed by them. Requests will either be sequenced in the queue, waiting for the idle connection in the pool, or rejected if they start exceeding the limit. While this may result in increased request times or even cause some requests to be rejected, it protects the DB instance from stability issues and increases overall application resiliency.
RDS Proxy is completely transparent for the application code. When connecting to the DB instance, the application should use Proxy's DNS instead of a standard RDS address.
Other RDS Proxy Benefits
RDS Proxy’s benefits are not limited to Lambda. It also helps to increase application resilience and security. In the event of database failure, Proxy can quickly switch to a new RDS instance, preserving active connections. This is a much faster and user-friendly process than the standard DNS update.
Also, with RDS Proxy, DB authorization can be done with IAM roles rather than DB credentials in the application code. This helps to reduce the risk of leaking credentials.
Before jumping on the bandwagon, you should be aware that at the time of writing, RDS Proxy is still in the preview phase and is not available in all regions. Also, keep in mind that this service is not free, but is billed hourly, based on the RDS instance size.
Summing Up: Lambda and RDS Can Work Together
If your use case requires an SQL database for your data layer, you don't have to give up on the idea of writing a serverless application. Lambda can work seamlessly with RDS instances, as long as you remember the specific requirements for this particular setup.
- Since RDS instances are running in your VPC and Lambda by default does not have access to those resources, you’ll need to configure the VPC connection when creating a Lambda function.
- Thanks to recent updates to VPC networking, the Lambda connection to the VPC no longer affects cold start duration. The new networking strategy is applied by default and does not require any manual work.
- Lambda can overwhelm the RDS instance with a large number of connection requests. Consider using RDS Proxy to manage connections and protect RDS from load spikes, even though this service is currently in the preview phase (hopefully not for long).
- If using RDS Proxy is not an option, either invest in a custom proxy server or limit the maximum Lambda concurrency and use more-powerful RDS servers to handle the expected number of connections.
While these steps are not trivial, they should enable you to achieve seamless cooperation between two worlds: serverless Lambda functions and server-based RDS.
Aurora Serverless: One Database to Rule Them All?
Aurora Serverless, a managed, serverless SQL database from Amazon, tries to bring together two worlds, combining all the benefits and elasticity of serverless services with the power and usability of SQL databases.
On many levels, Aurora Serverless is an amazing technology that sticks to its promises. It can dynamically scale up or down to accommodate a changing workload, and the process is completely transparent. It also provides impressive resiliency, with Multi-AZ replication out of the box.
Aurora Serverless also runs inside your VPC, meaning that this part of the Lambda configuration works the same way as with RDS: Lambda must be able to connect to this VPC via an ENI.
Unfortunately, issues with multiple connections present a bigger challenge. RDS Proxy does not currently support Aurora Serverless, so it's not an option here. The number of connections Aurora Serverless can handle at the same time depends on the instance class the service is using in a given moment.
If scaling Lambda functions will put a heavy load on Aurora, exceeding the limit of connections for the current instance type, it will automatically scale up to handle them. After the peak, it will scale down to save resources (and money).
Does this mean that Aurora Serverless solves all the pain points of using SQL DBs in serverless applications and that it should be a go-to technology for those use cases? As always, it depends. For predictable and stable workloads, classic RDS (preferably together with RDS Proxy) might be a better (and cheaper) idea. Certain use cases, with varied, unpredictable workloads, will benefit most from using Aurora Serverless.