This is the last part of a three-part series about using the Well Architected Framework (WAF) and the Serverless Application Lens (SAL) with Thundra to design better serverless systems.
In the first article, we learned what the WAF and SAL offer us and created a basic serverless architecture based on some simple use cases. In the second article, we answered five of the nine SAL questions for the architecture we created in the first article.
In this article we will answer the last four of the nine SAL questions. These four questions belong to three SAL pillars: operational excellence, performance efficiency, and cost optimization.
The Updated Architecture
In the first article, we created a serverless architecture for a recipe-sharing app. To answer five of the SAL questions, we had to update our original architecture quite a bit.
We added the Thundra serverless monitoring service and the AWS Secrets Manager and split up a Lambda function. Figure 1 shows what we ended up with.
Figure 1: Updated receipt architecture
Answering the Operational Excellence Pillar Questions
The first pillar asks questions about how you assess that your system works correctly and how you update it. If you lose customers because your system is constantly in a degraded state and you don’t even notice, you have huge problems. The same goes for upgrades to your system: If they fail all the time and you’re overwhelmed with regression bugs, you won’t make your customers happy for long.
OPS1: How do you understand the health of your serverless application?
The first question is basically about monitoring. If you build a serverless system, it will be distributed and run on hardware that you don’t own, both of which can lead to problems in the long run. To get this under control, you’ll need to find a way to see what's happening in the cloud.
That’s where Thundra comes in handy. We built Thundra from the ground up for this scenario. With its Lambda layer, it allows us to get insights about every Lambda function running in our system. With Thundra’s distributed tracing capabilities, we can even follow requests from entry at the API Gateway to the DynamoDB and Elasticsearch Service.
And since we already added Thundra’s Lambda layer to our architecture in the last article, we don’t have to change anything here. The layer will add tracing IDs to our events and generate a flow diagram, called Architecture View, so we can see at one glance what’s happening and where it’s happening in our system.
Thundra even allows us to enhance a trace with additional information so we can add the name of the owner to a trace, and much more.
OPS2: How do you approach application lifecycle management?
This question is about how you handle changes to your system. The idea is to define some processes that guide you through a system update, so you minimize your downtimes and can keep fixing bugs quickly.
The SAL states that you should use infrastructure as code (IaC). With this approach it’s not just our software that is considered code, but also the definition of the infrastructure the software runs on.
IaC allows us to version our application from end-to-end and deploy it in isolated development and staging environments to prototype new features and perform tests. IaC also minimizes errors by removing manual steps when provisioning new hardware.
You should also use a rollout deployment mechanism. This way, new features will gradually be available to the user base and if something goes wrong, it won’t take the whole system down at once.
An IaC framework that was created specifically for serverless applications is AWS SAM, the Serverless Application Model. It’s an extension of AWS CloudFormation and comes with helpers for serverless resources. It also integrates natively with AWS CodeDeploy, which supports gradual deployments.
Figure 2: Architecture with added CodeDeploy
It’s crucial to keep deprecation policies in mind. The versions of Lambda runtimes and layers you use won’t be around forever, so you have to know when to go to a new version before that happens.
Answering the Performance Efficiency Pillar Question
This pillar’s questions ask whether your system is fast enough and uses resources efficiently. You want the fastest system possible for your money, but you don’t want to pay for resources that aren’t used. Getting your performance metrics right is crucial.
PER 1: How have you optimized the performance of your serverless application?
Do you still use defaults? Do you think the more resources the better? This question is about measuring the performance characteristics of every part of your system.
It’s obvious that some loads require so much memory and CPU that you have to change the defaults or your Lambda function won’t even run. But after that, it all comes down to monitoring.
In addition to runtime performance, latency is also a huge part of a system’s perceived speed. If your users are distributed all over the world, it can be a good idea to deploy your API Gateway at an edge location next to them.
Thundra can help here too. It doesn’t just measure if your system works or has errors, it also measures memory usage, CPU timings, and latency out of the box!
Here is an extensive article about performance optimization with Thundra that will help you push latency down to a minimum. And since we’ve already added Thundra to our architecture, we don’t have to change anything to get these metrics.
Answering the Cost Optimization Pillar Questions
The cost optimization pillar asks about what you pay for your infrastructure and whether the cost is justified.
COST 1: How do you optimize your costs?
Optimizing your costs is as important as every other part of the SAL. If you have the fastest service and customers love to use it but you can’t sustain it, you won’t be in business for long.
Does every Lambda function have loads of memory assigned to it, or are there some that could do with less?
Do you use on-demand pricing for DynamoDB, or is your usage predictable and you could do with some cheaper provisioning?
Do you pay for Lambdas that just move data from API Gateway to another AWS service and could be replaced with a direct integration?
Thundra’s distributed tracing helps you find less time-critical event paths through your system that could lead to savings, with fewer resources assigned to them. If you want to go deeper, here is an article about cost optimization with Thundra.
Answering the Last Four SAL Questions
In this article we answered the last four questions of the Serverless Application Lens. These questions asked how you assure that:
- Your system does what you want.
- Your system does it fast enough.
- You don’t pay too much for your system.
Thundra provided all the metrics we need, and since we’d already added the Thundra Layer into our architecture, we didn’t have to change anything to get them. Only the lifecycle management of our app required us to add CodeDeploy to the architecture.
With Thundra, Building Well-Architected Serverless Apps Is a Breeze
The WAF is a very elaborate method for architecting cloud-based systems, and the SAL doubles down on it by formulating the WAF questions for serverless systems. If we answer all SAL questions before starting to build our system, we can be reasonably sure that things won’t go up in flames when we go to production.
Half of the answers to the SAL questions can be derived by using a Thundra feature. After we instrument our Lambda functions with the Thundra Layer, we get detailed insights and a good overview on our app so potential problems do not go unnoticed. The questions could also be answered with AWS services only, but this would leave our architecture a bit more tattered in the end.
Using Thundra for serverless monitoring will get us many insights without much configuration and keep our architecture slim and in turn, easily maintainable.