FAAS is a big success story. It allows us, the developers, to concentrate only on the application code. However, it’s worth noting what we often forget as we move away from monolithic service design to microservices: while we remove the complexity from the services, we add complexity to the system of services.
Therefore, we should observe the health of our system by monitoring it. However, the classical APM solutions aren’t applicable to the serverless environment since the vendor manages underlying servers and plumbing.
For instance, it may be difficult or impossible to use the most monitoring tools with serverless functions, since you can’t access to the function’s container. Debugging and performance analysis may thus be restricted to fairly primitive or indirect methods.
Unless you use Thundra.
Thundra gives you full observability on your serverless stack by providing tracing and log aggregation capabilities as well as metrics. But do you need all of those metrics that are available in the traditional systems? In this article, I’d like to discuss which performance metrics are crucial for your system’s health.
AWS Lambda has a built-in restriction for available memory use. If a function exceeds the memory limits, lambda will stop execution of the function and throw an exception. If you experience latency with any of your functions, you should check the memory rate.
Memory management is vital for your system’s health. You need to know how your memory is doing to decide whether or not you need to increase your memory size.
Total Memory Usage
The most important metric is the total memory usage percentage. With that, you can understand whether or not your application is doing well. If the memory is full, you need to fix that problem. That’s a no go. The first possible solution for that is to increase the available memory for that particular lambda.
AWS provides us
Max Memory Used on a given execution. Which is absolutely useful however that is not enough by itself.
You can visualize your functions’ memory usage in a time series graph with Thundra.
Once you identify there is a load on your memory and you don’t want to increase the available memory, or there is no option for you to do so, the one metric you need to pay attention is the GC.
It’s very crucial to decrease time spent for GC for the systems which need minimum latency, such as real time message queue processing applications.
When heap size reaches larger than a predefined level, GC will be activated and it will decrease performance of your system, especially if it is not running concurrently such as stop-the-world GCs.
Therefore, GC metrics are valuable for you to calibrate your applications memory usage.
When you say total memory usage, actually you sum up the amount of memory used by the stack and the heap. In case of heavy memory usages, the main suspect is generally the heap size.
When you identify the problem lies on the memory, efficient heap usage could solve your problem. You can check whether the heap you left behind is fragmented or if there is a more optimal solution that you can rethink about.
We know that higher memory means higher CPU for AWS Lambda.If your task requires higher memory or denser compute requirements, such as big data analysis, large file processing, and statistical computations, then you should definitely monitor you CPU Usage too.
For the other tasks, however, I would argue that monitoring cpu usage is not necessary most of the time. Because if your invocations are executed in less than a few seconds you’d not use your cpu that high.
By the way, please check the following blog post for further description of how CPU allocation works with AWS Lambda if you want to dive deeper on that subject.
To conclude, traditional APM methods aren’t applicable for the serverless environment. This trend also affects the importance of the metrics on serverless environment. Not all of them will help you because it’s the underlying vendors’ job to handle them, however, for the health of your application I would suggest monitoring memory usage, gc, heap, and cpu usage metrics on your lambda environment.
You can do all of these and more with the help of Thundra.