Since its inception back in Re:invent 2014, serverless has always been a good choice because of its pay-as-you-go pricing plan. This is especially great if you do not have consistent loads to your serverless architecture, and you have instant spikes. However, the cost can snowball in serverless architectures if you don’t keep an eye on it. Considering the complexity of the distributed architecture with serverless functions, controlling these costs can be a daunting task. Luckily, there are certain ways of keeping the costs under control with AWS Lambda. In this blog, I will give the three main tactics to control the costs: tune your memory allocation, strive for less invocation duration, and protect yourself against an unexpected number of invocations.
Is serverless really cheaper?
Even before everything, let’s discuss if serverless really is cheaper. When it comes to cost people say that having spot instances instead of serverless can sometimes be cheaper. This is true if you only have a direct comparison. For example; A Lambda function with 512MB memory running for 1 day costs $0.72 while the EC2 instance with same memory costs $0.14. This is exactly the definition of comparing apples and oranges. If your AWS Lambda runs non-stop for a day you better think of reconsidering your architecture or even quitting serverless. Serverless is more suitable for an unexpected amount of load but not to be alive constantly. Plus, it frees up many cost items from your wallet thanks to its great advantages. This could be a separate blog topic and explained greatly here. Forrest Brazeal’s cartoon also explains this with a picture that is worth a thousand words.
When you’re trying to control the serverless costs, it turns out that Lambda is not the only and most important cost item in your bill. If you’re using API Gateway to trigger your functions, it is $3.5 for 1M request. If your function with 512 MB of memory runs 300ms on average when triggered by API Gateway, it will cost $2.5. Cloudwatch costs can also be a headache. It costs $0.5 per GB for data ingest and $0.03 for storage. If you’re not commenting out your debug prints in production, you will be in big trouble that will prevent you think about AWS Lambda costs. There is also data transfer costs which are very similar to those in EC2 and very complicated. Check out this chart of Corey Quinn to understand how complicated it can be.
The cost structure of AWS Lambda
Before diving into tips, let’s revisit the cost structure of AWS Lambda. AWS charges your Lambda functions with a newly-invented unit GB-sec. If your function is using 1024MB memory and runs for a second, you will be charged in 1 GB-sec which means 0.000001667 in dollars. Seems cheap at first glance but when you have one function with 512MB memory which runs for a second every time and gets invoked 100K times in a day, this will cost you $18.34 monthly, when you use all the free tier with this function. Still seems cheap but guess how much the cost will ramp up when you have more functions, invocation counts, memory and longer durations.
As a serverless user, and thanks to many use cases with our customers, I’m now able to come up with the tips for controlling the costs. You simply need to control all factors affecting cost and these are duration, memory, and invocation counts. Let’s look closer to these items one by one.
Decrease or sometimes increase(yes!) the memory of your function
I always wonder how people select the best memory that fits their use cases. My talks and discussions yield that they just arbitrarily select the memory with some biases around. It is mostly like “I’m doing a CPU intensive data processing job, I once came across a timeout with 2GB, so I’m giving 3GB to avoid that” or “My function just transforms the data it receives write it to DynamoDB and publishes a message to SNS. Just I/O, 128MB is enough”. My interviews yield that people generally don't give a second thought to their decisions. I came across this Twitter survey by Alex Casalboni showing that the majority of people don’t make a memory tuning after all. If you want to keep control over costs, you should certainly tune the best memory according to your function’s behavior.
Suppose you’re watching the memory usage of your function reported by your function to CloudWatch. It seems that your memory utilization looks like 10% consistently. Should you decrease the memory and save money? Well, No. You should first have a look at another metric: CPU utilization. If your function’s CPU utilization is already high, you should think twice before decreasing the memory. You might make your function running forever by decreasing the memory and it turns out to be a bigger cost item for you.
Using Thundra, you can check how the memory and CPU utilization goes side by side and understand if it is safe to decrease the cost. In the following image, you can see the function’s (with 512 MB of memory) memory usage is only 2 digit and, CPU utilization is at most 30% for a week. It is safe to decrease the memory to half and watch for the reaction.
You sometimes need to increase the allocated memory of your function. It can be the case that when you see that your function is doing something very CPU-bounded and it slows down your function drastically. In the following image, you can see that a function is taking too long before preparing the input to give the PostgreSQL.
When we provide more memory which means more CPU to this function, it is expected to run faster and decrease the time spent for preparing the input. Let’s see what happens when you increase the memory to 2GB from 512MB.
We provided 4 times more memory, resulting in input preparation to execute instantly, and hence we decreased the billed duration from 1500ms to 300ms which. This leads to an improvement of 5 times. It means that you will now pay 20% less for this function at the end of the month. As you can see, increasing memory can also be beneficial depending on your function. So, never get tired of tuning your memory.
Keep an eye on how many times your function is invoked
This sounds very straightforward when you first hear of it but the reason for the unexpected Lambda bills are mostly because of an unexpected number of invocations when your function goes into retry loop. The most common cause of your function running into retry loop is a reason called “poisonous message” in a Kinesis stream. When there is a message which is not parseable by your business logic, your function fails and starts retrying to parse the same message until the poisonous message expires. This is a real disaster that also happens with DynamoDB streams and SQS. To avoid that, you need to have a robust error handling mechanism, especially for such situations. In order to test your error handling in detail, you should better get prepared for such situations even before you move to production. You can check the blog written by Yan Cui to understand how you can use Thundra for injecting chaos to your serverless architecture.
Another use case illustrating the surprising number of invocations is the old-school denial of service attack to trigger your Lambda function through API Gateway. If you don’t have a proper throttling mechanism for your API Gateway, expect your Lambda function to get invoked until it hits the concurrent execution limit. Depending on your use case, consider putting a concurrency limit to your Lambda function or throttle rule for API Gateway.
In order to keep your eye on the number of invocations, you need to track it periodically by checking out the metrics on Cloudwatch or Thundra. With Thundra, you see not only the number of invocations but also the duration at the same time with a heatmap. Hence, when there is a discrepancy related to any of them, you can point this situation very easily and focus on the root cause fast.
Strive for less duration
In the first tip, we talked about memory tuning to decrease the duration of the invocations of a function. No matter how hard you try to tune the memory of your AWS Lambda function, you sometimes cannot decrease the duration because your function takes long while waiting idly for downstream services to complete. In this case, you need to find a way to decrease the time spent on these services. For example; if your function is making HTTP calls to external third parties, you and your AWS bill directly relate to the health of those services. If those services start to deteriorate with slow responses, this will automatically affect the duration. Similarly, if your function is making a query to a relational database, you need to watch out for the queries it makes with an ORM tool. Normally, you wouldn’t care much when you’re using a non-serverless architecture. But, time is money in serverless literally, and you should care about performance degradation caused by relational databases that you use. For example; A common n+1 problem can happen with relational databases and this in-turn can slow down your function unexpectedly.
But how would you keep your eye on with external services? Thundra helps you control the time that is spent while waiting idly for third-party services both in the function and invocation level. In the invocation level, you can see the time spent on different services by latency breakdown in your invocations list. When you discover a jump in time spent, you can put your focus directly on this service. For example, in the following figure, it is seen that the duration jumped because the interaction with third-party services jumped. Most probably, you already know what this service is supposed to do. Just fix the issue and start saving money.
Sometimes, it is better to investigate relations with external resources from a broader view. In the following image, you can see the distribution of time spent on external resources by a Lambda function. The image shows that your function slows down mostly because of PostgreSQL table called `thundra_demo`. Almost 75% of the time is spent on this table which roughly makes 500ms. If you can decrease the time by 300ms, you will be able to halve the costs related to this function.
Another cause of invocations with longer durations is cold starts. You need to deal with both the frequency and the duration of the cold start. I frankly believe AWS will solve the cold start issue in the future but in the meantime, you can check the blog we wrote almost a year ago. You can keep track of cold starts with the chart above and see if it follows a pattern. As you can see, the count of cold starts is almost the same for all times for the function above. If you had that many containers warm with Thundra’s warm up module, you wouldn’t have had any of those cold starts and could save time.
Serverless is cool because of its ultimate scalability and pay-as-you-go pricing model. However, costs can skyrocket if you don’t make the necessary arrangements. Invocation count, duration and memory directly affect the costs and you need to take precautions to control those all together. So, you need to battle with these factors relentlessly to keep your AWS Lambda costs under control.
In closing, I believe that Thundra can be helpful to control Lambda costs and make the necessary tuning to save money. You can sign up to our web console and start experimenting. Don’t forget to explore our demo environment - no sign up needed. We are very curious about your comments and feedback. Join our Slack channel or send a tweet or contact us from our website!