Serverless offers opportunities that are transforming how we think about building in the cloud. The days of worrying about complex and brittle cloud infrastructure on which sits your entire business logic are soon coming to an end. All of these responsibilities are increasingly now delegated to a cloud vendor, allowing you to focus primarily on your business logic.
However, due to a large amount of the underlying architecture being delegated, serverless results in an unintended disadvantage. As we push more of the responsibilities onto the cloud vendor, we not only give up control, but also observability. This inadvertently leads to a black box situation, where we become unaware of how and why our serverless architecture behaves.
There are consequences of having a black box, and this is a well-known issue for anyone adopting serverless. These consequences uniquely exacerbate the problem of anti-patterns. That is because anti-patterns become difficult to detect, especially when there is no observability.
The difficulty in detecting anti-patterns is a niche of computational research — a problem that computer scientists in academia are tackling. Eman K. Elsayed and Enas E. El-Sharawy from the Mathematics and Computer Science Department, Faculty of Science of Alazhar University explored the issue in their paper “Detecting Design Level Anti-patterns; Structure and Semantics in UML Class Diagrams”. The greatest issue is that anti-patterns don’t necessarily result in any visible errors in the system and can go undetected for major periods, resulting in performance problems and higher operational costs.
The impact becomes clear considering new adopters of serverless are more susceptible to falling for anti-patterns. Not being able to detect anti-patterns easily and the fact that they’re operating in a black box environment may be frustrating. This may incite the adopter to dismiss the performance and operational issues experienced simply as some of the limitations of serverless, which may not be true.
Observability mitigates this black box effect of serverless, reducing the difficulty of anti-pattern awareness. It must be noted though that simply enabling observability through monitoring tools does not provide much resolution. It is necessary to understand the anti-patterns that plague serverless architectures, so that their telltale signs can be spotted in the monitoring information procured.
Therefore, this article goes through some of the major anti-patterns unique to serverless and describes how the right strategy in observability can cushion the impact of anti-patterns creeping into your serverless architectures.
Using Async like Sync
Serverless is an event-driven and async programming model. This is reinforced considering the granularity of the system where each function is preferably doing a single task. Serverless applications tend to work best when asynchronous. A concept that was preached by Eric Johnson in his talk at ServerlessDays Istanbul titled “Thinking Async with Serverless”. He later on went to present a longer version of the talk at ServerlessDays Nashville.
However, as teams and companies begin to adopt serverless, one of the biggest mistakes they can make is designing their serverless architecture while still having a monolith mentality. This results in a lift and shift of their previous architectures to the new intended serverless architecture. This means the introduction of major controller functions and misplaced await functions.
As a result, the function that is in the idle state will also be charged since it is still technically active. There is still a worker node servicing the function with all the needed underlying architecture as the function simply waits. Hence going against the pay-as-you-go principle of serverless.
This problem is further exasperated when chaining functions together. This is the process whereby one function makes an async call to another function, waiting for a response, while the second function is called upon another function or makes a read/write operation to a storage service. This increases the possibility of unreliability as the first function might time out. This is even worse when functions make calls to storage devices outside the vendor’s ecosystem, or on-prem storage services.
What Should You Observe?
Understanding the anti-pattern it is clear that the visible effects of it are potentially higher costs incurred on Lambda functions and longer execution times due to waits, or even higher percentage of timeout. Therefore if you see your functions behaving accordingly you may want to investigate them further. So it goes without saying the first step is to keep an eye on the cost, duration, and timeouts of your functions.
Depending on your monitoring tool, the process can be made more efficient by setting up alerts on these metrics. For example, Thundra allows you to set up alerts on all these metrics. It even gives you the flexibility to define the rate of the metrics within the desired time intervals.
For an in-depth analysis, investigating the distributed traces of your application may lead to more fruitful and well-founded insights. That is because as the transition to microservices occurs, the system itself becomes distributed. Thus, the manner of monitoring has to facilitate a more holistic approach, where each transaction flow should be measured in the form of a trace. This allows the monitoring of the business flow interacting with each other synchronously or asynchronously. An error or delay in service might be caused by any of the upstream or downstream services or both.
The Need for Sharing
The goal of building with serverless is to dissect the business logic in a manner that results in independent and highly decoupled functions. However, there are scenarios where libraries or business logic or, or even just basic code HAS to be shared between functions. Thus leading to a form of dependency and coupling that works against the serverless architecture.
The most prominent pitfall resulting from this is that it hampers scalability. As your systems scale and functions are constantly reliant on one another, there is an increased risk of errors, downtime, and latency. The entire premise of microservices was to avoid these issues. Additionally, one of the selling points of serverless is its scalability. By coupling functions together via shared logic and codebase, the system is detrimental to the value of scalability itself.
Before diving into the monitoring solutions, it must be conceded that some use cases may have no resolve but to have shared logic and code bases. Such issues spring up in applications of machine learning where large libraries have to be shared across various functions used to process test, validation, and training datasets. AWS provides Lambda layers in an attempt to offer some resolve, but this may not always be the ideal solution.
In most cases, the need to share code libraries and logic was not only an antipattern but also a technical limit on serverless functions. For example, AWS Lambda functions have a hard limit of 512MB on /tmp storage. That means when developers are building their AWS Lambda functions code, one must always be aware of this limit and how they are using it. After all, the /tmp directory is meant for temporary storage, Therefore once the serverless worker node is torn down, the data within the /tmp is also no longer available.
AWS recently solved this problem with the release of a much-coveted Amazon EFS and AWS Lambda integration. This new integration allows functions to access a shared library or data, via an integrated Amazon EFS instance. Nevertheless, this does not justify making functions dependent on one another. Just because something is now achievable, does not mean it is the most effective solution considering the pitfalls resulting from the antipattern mentioned above.
What Should You Observe?
So as acknowledged above, sharing information and coupling of serverless functions may be intended, and no preventive measure would resolve the issue. In this case, it becomes imperative to measure the effects of such an architectural set-up. The measuring metrics such as cost and cold start become very important. Cold starts especially as the operation of one function may depend on another due to coupling. If one of the functions experiences latency due to cold starts, it may have a ripple effect on all other coupled functions.
Overall the entire architecture should be mapped. Both AWS and Thundra can provide an overview of your cloud architecture. This would give you an overview of how your cloud resources are talking to one another, and where improvements can be made. Awareness in how cloud architecture is being built is the only manner in which the issue can be effectively avoided. Breaking up business cases into separate functions may not always be conceptually easy, but an activity that must be conducted nevertheless, that too with caution.
Building upon the notion of breaking large compact business cases into smaller independent functions, there is a possibility of reaching a level of granularity that eventually proves detrimental. Breaking down monolith systems definitely has its benefits, but there is also some overhead that has to be conceded. There eventually comes a point where the overhead would exceed the benefit and hence it is imperative that this point is found.
The need for communicating events between individual functions leads to thinking about webhooks and APIs. Therefore, an increase in the engineering efforts, security risks, and latency. As the number of functions scales, these concerns are multiplied.
Serverless’s main goal is to abstract the complex underlying architecture, allowing the prime focus on business logic. However, it is clear that as a push towards breaking down the business logic to individual functions reaches a certain point, the overhead negates the benefits. Hence acting as an antipattern.
What Should You Observe?
Architectures in general can get extremely complicated as your system grows. Therefore, the first thing to definitely go for is a map of your distributed system architecture as you begin to adopt serverless.
Another sign of overly granular architectures is when serverless functions become overly chatty. The major overhead of a granular architecture is communication and that is what should be avoided. Communication overhead and unnecessary calls to AWS Lambda functions mean more engineering complexity and potentially higher costs. Therefore, it would be beneficial to check for costs and total invocation count.
It is also recommended to dive deeper into invocations and keep track of triggering components. If it is noted that one Lambda function is constantly being triggered by the same triggering Lambda function a substantially high number of times, then maybe merging the two Lambda functions can be considered.
Additionally as mentioned, the move from monolith to microservices and eventually a pure serverless distributed architecture results in the need for communication infrastructure. It, therefore, becomes necessary to monitor the payload of data being sent between these
In conclusion, serverless is booming but it doesn’t come without its own pitfalls. There are various best practices to avoid anti-patterns, however, the ultimate solution is to couple these best practices with a strong tendency for observability. Those adopting the technology need to be of not only the antipatterns that are possible but what to monitor if they do find a way into the system architecture.
Observability is still a maturing field, and cloud vendors are now heeding the call. However, inbuilt monitoring solutions still do not fulfill the overall needs and offer only basic observability. In achieving true observability according to the three pillars of “metrics”, “traces” and “logs”, it would be advisable to look towards a third-party monitoring tool specialized for the job.
It may be argued that the need for monitoring is trivial and that efforts should be spent substantially on building the architecture under the understanding of best practices. However, I would argue that both are as equally important and that the folly of many premature serverless adopters is not lighting up the black box they are operating in. After all, Rome was built on an anthropological model that is still replicated in today’s modern world. But it didn’t take long for it to collapse!