x

[eBook Presented by AWS & Thundra] Mastering Observability on the Cloud đź“– Download:

Which database should I choose for my serverless applications?

May 26, 2020

 

Which database should I choose for my serverless applications_

Choosing the right database has one of the most direct effects on the cost and performance of your application. A serverless environment is no exception to this principle. By considering parameters like access patterns, schema (data model), and expected performance, it's possible to logically arrive at an optimal decision for your database type.

Databases in general have two broad classifications: SQL relational or normalized databases and NoSQL or denormalized databases. A third classification has also arisen for specific use cases like modelling graphs, in-memory caching systems among others.

But in this article we'll explore certain aspects of SQL and NoSQL databases and take a look at the services offered by AWS for those respective types of databases. 

SQL Databases

SQL, or relational databases, use Structured Query Language for working with data. It’s a powerful language and one of the most versatile options out there for handling data. You can make complicated queries for retrieval, which comes in quite handy if you are looking to perform data analysis aka OLAP (online analytical processing). 

SQL excels in handling multi-row transactions at scale. This type of database is the right choice if you don’t fully understand the access patterns your application may have over time and don’t expect exponential growth. 

However, SQL databases come with their share of drawbacks. The schema needs to be defined before you can start working with the database, requiring significant preparation and planning. Once the data model has been setup, you’ll need to maintain that structure throughout the existence of your application.

Scalability can also be problematic as SQL systems have vertical scaling; adding more CPU and RAM to handle the load can become very expensive, and there is a limit to how much you can “scale up”. You can always add more servers, or “scale out,” but that requires extensive effort and comes at a steep cost.

NOSQL Databases

NoSQL databases unlike SQL don't have a specific query language and champions the concept of working with denormalized data sets.

Their types include:

  • Document Based
  • Key/Value Pairs
  • Graph Data
  • Column Oriented Data Structure

The elimination of complex joins across tables by simply putting everything in one collection with a systematic access pattern is the highlight of NoSQL databases. Their high performance, scalability, and extensibility makes NoSQL a perfect match for serverless development.

NoSQL databases are known for their ability to scale out to handle tremendous OLTP (online transactional processing) work loads and manage disparate data sets. Also, changing the schema won’t disrupt your currently working application.

Dynamic schemas + high performance at peak = best database option, right? Not quite so. If you are unsure of what data patterns your application entails then it's probably best to avoid NoSQL or invest time into learning NoSQL first before shooting yourself in the foot.

Think of it this way, a startup will probably benefit in getting out an MVP as soon as possible by choosing SQL databases because of the flexibility and general developer familiarity over NoSQL systems, but a NoSQL database can lead to future gains because of its very high scalability. The only hurdle is appropriate data modelling so as to service the future needs of data access effectively.

Those who start with NoSQL from a SQL background without the proper understanding of access patterns may walk themselves into a corner or use NoSQL as if it was still SQL. This will also impact your application as it grows.

With that being said, let's explore Amazon’s managed versions of SQL and NoSQL services: RDS and DynamoDB, and their performance with Serverless applications.

AWS RDS vs DynamoDB

AWS RDS

Amazon Relational Database Service (RDS) comes with six engines to pick from, MySQL, PostgreSQL, MariaDB, Oracle, SQL Server and their own implementation, Aurora.

As explained already, relational databases follow a set pattern of structured data modelling and are governed by the SQL language for data handling. So if your decision is to go with Amazon RDS, here are a few key areas to be aware of.

Cost

Your Lambda functions are pay-per-use but you will pay the hourly pricing for RDS instances unless your choice of engine is Aurora Serverless.

Managing Connections

Managing the connection limits with Lambda is important. Lambda does terminate connections on container expiration but the MAX CONNECTIONS issue still looms when the application comes under stress.

Utilizing RDS Proxy, lowering connection timeouts for RDS, implementing a caching strategy, limiting concurrent connections etc. are some ways to deal with this problem.

Performance and High Availability

Amazon’s Aurora database service (either MySQL or PostgreSQL) can prove to be a great option especially in terms of read performance (across regions too) owing to their ability to replicate quicker than RDS counterparts.

Aurora does have clear differences in terms of performance and high availability when compared to RDS but it comes with an increased cost. Aurora should be chosen if your use case demands it. Aurora Serverless can be the next best thing for a fully serverless system but it’s performance isn’t quite there yet.

It’s main problem is dealing with cold starts. Aurora Serverless should ideally be chosen for testing or for small and inconsistent work loads.

DynamoDB

Now if DynamoDB is the choice for your applications storage needs, the following is what you get.

Cost

Unlike most NoSQL databases, DynamoDB is not server based. It’s pay-per-request pricing and fits in perfectly with the Serverless way of things. This makes the idea of fully distributed serverless systems a close reality.

Managing Connections

DynamoDB can virtually have unlimited concurrent requests as long as the throughput limits have been defined.

There is an additional model called PAY_PER_REQUEST which allows your DynamoDB database to scale to 0.

Performance

With millisecond performance, DynamoDB can handle almost any load you throw at it.

If you need even better read times, DAX (DynamoDB accelerator) is a fully managed, in-memory cache solution that Amazon offers. So performance and scalability is the least of your worries with DynamoDB.

DynamoDB with Lambda seems like a great choice from a cost and performance perspective. The only hurdle is designing the table for effectively servicing the varied access patterns your application may have.

MONITORING ON THUNDRA

Thundra’s platform for serverless observability and monitoring can further improve your costs and performance by giving actionable insights about your application. Traces with detailed information between various services of the application pinpoint areas of inefficiencies, which you can then fix. With regard to databases, especially with RDS instances, having insights about long running queries tells you exactly which specific part of the application needs to be optimized.  

Let’s take an overview of what Thundra looks like. Once signed up, Thundra guides you through the onboarding process of instrumenting your Lambda functions. Your architecture and dashboard screens should look something like this. 

(Note: Both screens show details based on functions being grouped to a project. Unless specified, all instrumented functions will belong to “default project”)

image5

Dashboard view

The dashboard at a glance will give you direct and important insights that need your attention.

image3

Architecture view

Clicking on the traces between the services in the architecture view will list out every single trace detail which on selecting will navigate to a page with detailed information about that invocation as below.

As you can see in the image, we are able to gain additional insight into DynamoDB and our PostgreSQL database.

image2


Trace detail view

You can also view the above invocation detail page from a list of Unique traces (traces that perform a specific operation in an application is regarded as a Unique trace) by navigating from the left panel. By filtering out Long-running unique traces you know which specific queries are increasing your cost and slowing down the application.

image1

Unique traces view

A quick look at Thundra’s navigation panel on the left shows the different ways you can approach application monitoring. Thundra collects every trace between your function and other services to ensure that you have all the data needed for optimizing the application to its fullest potential.

CONCLUSION

We have seen how SQL and NoSQL databases work with their strengths and weaknesses. We also took a quick look at Amazon’s database services and how monitoring them with a platform like Thundra can enhance your metric tracking capabilities for serverless optimizations.

SQL databases will get the job done for most use cases, not to mention their robust data access and manipulation capabilities, provided you are not expecting millisecond performance at scale.

On the other end, a NoSQL database like DynamoDB can do all the heavy lifting even at peak use without you having to worry about server management or fail overs.

It’s quite possible to achieve a result similar to SQL querying, but DynamoDB requires a learning curve in terms of data modeling and knowing how to index table attributes for efficient access patterns.