On the 16th of June, AWS added to the serverless arsenal the much-coveted Amazon EFS and AWS Lambda integration, furthering an already expanding feature set of the AWS Lambda function. AWS has been continuously improving AWS Lambda since it was first released in 2014, marking the beginning of the serverless wave in cloud computing. It must be noted that Amazon EFS is not a relatively new service and is something that has been around since 2014, mostly being used in conjunction with AWS’ container services.
However, now by successfully integrating Amazon EFS with AWS Lambda, those developing the cloud have been empowered to achieve greater use cases and more convenient data operations. Amazon EFS is AWS’s file-sharing service that allows users to manage file shares resembling traditional networks. It can be mounted on various compute machines both in on-prem and in the cloud, with AWS Lambda being the newest addition to the list. As a result, this has opened up various use cases that were not initially possible with AWS Lambda’s previous feature set. This was mainly due to the Lambda function’s limitation of 512MB of
/tmp directory storage.
However, with the new Amazon EFS and AWS Lambda integration, developers can overcome this barrier. By now gaining the ability to work with larger file stores, new use cases and functionalities are unlocked. This also increases AWS Lambda’s desirability, pushing for overall serverless adoption.
The Problem and the Maneuver
As mentioned, there is a limit of 512MB of internal storage for AWS lambda functions. This proves problematic when developers first start using AWS lambda as they usually find themselves unable to write to the filesystem. This has become a common concern as the same function they developed locally, on their virtual machines, or somewhere else no longer works when the code is uploaded to AWS Lambda.
The core issue is the fact that the
/tmp directory has a hard limit of 512MB. That means when developers are building their AWS Lambda functions code, one must always be aware of this limit and how they are using it. After all, the
/tmp directory is meant for temporary storage, Therefore once the serverless worker node is torn down, the data within the
/tmp is also no longer available.
This is where the concept of “stateless” computing arises as the state of the function is kept of the worker node which can be considered the server instance. The data is usually stored and fully managed in other fully managed AWS store resources such as Amazon S3 and Amazon DynamoDB. This data is then accessed by the AWS Lambda function in its next invocation as it puls the needed “state” or data from these external stores.
Therefore, developers usually work around the limit of internal storage by leveraging filestream concepts in the programming languages of their choice to read, process and write files on external storage resources, without storing the entire file in the AWS Lambda function itself. The methodology is well adapted especially when using S3 as the external store resource. As a result, two questions come to mind. The first being, what was the need for adding the Amazon EFS integration, if Amazon S3 was already available? The second question is what are the use cases that are now achievable as compared to when the AWS EFS integration was not available?
The second question is further articulated considering best practices and anti-patterns when building serverless applications. In most cases, developers familiar with AWS Lambda will not hit this limit because it's often thought of as an anti-pattern. Nevertheless, there may be a niche set of use cases that would have required temporary storage which is now available with Amazon EFS.
A Step Beyond S3
As mentioned, S3 is usually the popular object file store of choice when working with AWS Lambda functions. So what is the point of having the Amazon EFS integration? To understand the need, both Amazon S3 and Amazon EFS should be understood relative to each other.
Amazon S3 provides simple object storage as compared to Amazon EFS which is an elastic file storage system, scaling to the storage requirements you need automatically. Amazon EFS is mainly used to service the needs of SaaS and content management systems whereas Amazon S3 is mostly used to hold objects and service static websites.
Moreover, Amazon EFS is faster than Amazon S3, achieving lower latency, and higher IOPS. As a result, Amazon EFS is best suited for large quantities of data, such as datasets fed to machine learning algorithms. Amazon EFS allows concurrent access to various instances connected to it via access points, making it possible to process and analyze large amounts of data seamlessly. This is something that would not be as conveniently achievable when using Amazon S3.
Moreover, considering the pricing structure of the two services, Amazon EFS could prove to be more cost-effective. On the face of it, Amazon S3 is definitely the cheaper option considering the pricing option. However, if once the pricing structure of the service is broken down, and the use cases are taken into account, it is noticeable that Amazon EFS would be the better option to go with. This is further realized with the Bursting throughout mode where the developer does not incur costs related to bandwidths and requests. The charge that would be incurred is a fixed value of 0.30 USD per GB per month in the Standard Storage tier of Amazon EFS, in the Bursting throughput mode.
Additionally, sticking on the topic of cost, Amazon EFS, as compared to Amazon S3, charges per usage. This is a familiar cost structure for those already building applications with AWS Lambda functions considering the pay-as-you-go model which is a pillar of the serverless concept. This pricing structure can be further leveraged using the Lifecycle Management functionality to shift less frequently accessed files to more convenient pricing tiers, possibly reducing costs up to 85%.
Considering the benefits of cost-effectiveness, the ease with which Amazon EFS can handle high load-related applications, and the service’s ability to handle large data sizes, the use-cases with EFS become apparent.
Unlocked Use Cases
Knowing the benefits of Amazon EFS and understanding the how limitations of AWS Lambda can now be overcome, new use-cases have become achievable. As discussed, serverless functions had low internal storage and serverless itself pushed towards stateless architectures. Even though these conditions should have sufficed for any developer following best practices and proper building patterns. However, this had previously left out a set of niche use cases that, albeit not common, acted as a deterrent to wider serverless adoption. Some of these use-cases are discussed below.
Machine learning was often a use case untouched by serverless enthusiasts and developers. With large data stores and libraries, it would be almost impossible to load everything onto a single Lambda function. However, now with Amazon EFS these large dependencies, such as those used in Tensorflow, can now be loaded and incorporated into machine learning applications. Previously AWS Lambda Layers was the way to go, but such machine learning dependencies would prove too large.
There are, however, performance issues to take into consideration. One of them is the latency of read-writes and the other is the cold start problem. Remember, if a Lambda invocation requires the set up of the worker node, then there is some latency incurred, aka coldstarts. The way to get around this is by enabling Provisioned Concurrency.
One of the great benefits of Amazon EFS is that one instance can be accessed by multiple AWS Lambda functions. The multiple functions can access different paths in a file system and leverage different system permissions. This goes one step further with the same AWS EFS instance also being accessed by Amazon EC2 instances and AWS Fargate.
Sharing data between various concurrently running instances may not be the best patterns to adopt, but it does prove its usefulness in various use cases. For example, different AWS Lambda functions can simultaneously access different blocks of a data set to perform black-box testing and writing results. The same EFS instance can then also be accessed to continuously improve a prediction model in real-time with processing conducted on another AWS Lambda function or even an Amazon EC2 instance.
Of course, the architecture is simpler aid than done. Different file permissions and communication infrastructure would need to be taken into consideration. Moreover, as discussed read/write latencies also need to be taken into account, but nonetheless these use cases are now achievable.
Now that this integration is possible, what changes to the cloud infrastructure can be expected. The first thing to note is that Amazon EFS instances reside within a VPC. Hence the AWS lambda functions connected to these Amazon EFS instances will also reside in the same sub-net groups. Previously this would be discomforting as the cold start durations incurred by AWS Lambda functions inside a VPC would be substantially high. However, AWS has been continually improving the cold start issue and recently improved the underlying infrastructure to achieve almost zero latency impacts of AWS Lambda functions residing within a VPC.
As a result, for those integrating their AWS Lambda functions with Amazon EFS would expect infrastructure as shown in the diagram below:
As is already discussed, the AWS Lambda functions are within the same private subnets of the Amazon EFS they have mounted. It should also be remembered that Amazon EFS is a regional service and therefore accessing these instances constitutes for adding mount points to the VPC. These can be added to the private subnets in which the AWS Lambda function will run. The mount target will most likely have security groups and they can thus be leveraged to allow access from other compute resources residing in the same VPC.
AWS Lambda has come a long way since it was first unveiled to the world back in 2014. Serverless adoption, however, was not as expected amidst the fanfare and excitement. This got the ball rolling, but the countless limitations that plagued the Lambda function acted as a deterrent to widespread adoption. Over the years we see these limitations being solved, tearing down the barriers to adoption. From tackling cold starts and introducing Lambda Destinations, this new EFS integration is simply a small step in a much larger journey. After all, as the famous Chinese proverb clarifies, "A journey of a thousand miles begins with a single step.".