4 minutes read

POSTED Jan, 2022 dot IN Debugging

Building, Tracing, and Monitoring Event-Driven Architectures on Google PubSub and BigQuery

Batuhan Caglayan

Written by Batuhan Caglayan


Senior Software Engineer @Thundra

linkedin-share
 X

The use of distributed systems is increasing day by day. This makes communication between systems important more than ever. In this article, we examine event-based communication between these systems. And we present a basic event-driven architecture (EDA) for basic use cases.

Key Concepts About Event-Driven Architectures (EDA)

Event-driven architecture is a design model that connects distributed software systems and allows efficient communication. Event-driven architecture makes it possible to exchange information in real-time or very close to real-time.

Event-driven architecture concepts are mainly based on the pub/sub design pattern. Let's take a little bit of a deep dive into what pub/sub design is.

Publish/Subscribe is a messaging service for exchanging event data among applications and services.

There are three main concepts about Pub/Sub;

  • Topics are named resources. It is a contract between publishers and subscribers.

  • Publishers are the sources of data (messages). They publish messages to topics and they are not interested in the destination of data.

  • Subscribers are the listeners of data (messages). They are not interested in the source of data. They subscribe to topics and they only get information for these topics.

  • An event is a state or an update within the system that triggers the actions of other systems.

  • A message broker is an intermediary software that relays messages from one or more systems to other systems.

Pros & Cons of EDA

Pros

  • Activity is asynchronous (“fire and forget”), this provides loosely coupled communication between systems.

  • Allows for great horizontal scalability.

  • Being loosely coupled, the failure of one won’t have any side effects on other systems.

Cons

  • Testing can be a challenge.

  • Requires a well-defined policy for message formatting and message exchange.

Basic Order Cancelation Scenario with EDA

In this section, we build a basic event-based architecture. Our scenario is an order cancellation process where clients should be able to cancel orders without any disruption.

 In our scenario,

  • we update the order status.
  • we update stock data of related products.
  • we store stock movements.
  • we store order cancellation data for ad-hoc queries.

Note: All implementations are kept simple for educative purposes and not ready for production to serve actual customers.

We get started by picking the below tools for creating our architecture.

  • Google Pub/Sub is our message broker. (Google Pub/Sub is a fully managed message broker by Google.) We implement a basic publish, asynchronous pull , synchronous pull with Google Pub/Sub.

  • MongoDB is our main storage.

  • Google BigQuery is used for storing order cancellation data. (for ad-hoc queries)

  • Redis is used for distributed locks.

We drew a basic diagram of this scenario as shown below.

An example of an order cancellation flow is as follows;

Assume that there is an order with an id => 619e3c02829fdee25f0268d0 and this order includes 2 unique products.

Let's say product-1 and product-2. In the basket, we have 2 items for product-1 and 3 items for product-2 to make things a little bit real. And the initial stock count for product-1 is 100, product-2 is 50.

  • Order cancellation request comes to order-service. (order with id => 619e3c02829fdee25f0268d0)
  • order-service updates the related order status on its own database. (MongoDB). It updates the order status to be canceled.
  • After updating the status successfully, the order-service publishes order cancellation messages.
  • The stock-service is a subscriber of order cancellation messages. This service consumes cancellation messages and updates stock information of related products in the basket (MongoDB).  (stock-service increases the counts of the canceled products in the stock back to the previous amount)
  • After the stock information update process, This service publishes stock movement messages. (two messages for two products)
  • The stock-movement-service is a subscriber of stock movement messages. This service consumes stock movement messages and stores them on its own database. (MongoDB)
  • The order-analytic-service service is a synchronized subscriber of order cancellation messages. Meaning that it pulls the topic every minute and this service processes those messages in a batch. It then stores the data in its own database. (Google BigQuery for further ad-hoc queries).

The message content looks like this;

The message content looks like this;
           

You can access details on our demo project here.

Tracing and Monitoring

Even if you think that you design the best architecture, build a solid platform and have a perfect CI/CD process, it is still very important to monitor your system and it is crucial to do it continuously. That’s where Thundra kicks in and assists you in observing the system status and tracking metrics.

Thundra supports tracing for a lot of popular services out of the box like AWS Lambda/DynamoDB/Kinesis/SQS/S3/…, Google PubSub/BigQuery/…, MongoDB, Elasticsearch, Redis, MySQL, PostgreSQL, and many more… Check Thundra APM's documentation for more information.

To achieve this kind of observing capabilities in your architecture, you can enable Thundra for your project by following the instructions in the Quick Start Guide.

Let’s trace the whole flow we’ve just created end-to-end during our order cancellation scenario with the help of Thundra. You can find the configuration documentation of Thundra APM'S Google Pub/Sub support from this link.

This is the trace map of our order cancellation flow shown on Thundra when we enable it. You can easily see the system components on the given trace map as shown below.

In this tracemap, as you can see, the order-service-analytic service pulls the related topic every minute and it stores the data on Google BigQuery.

From the performance metrics of the order-service-analytic service, you can see that the service pulls messages 10 times from Google PubSub and then writes the result to Google BigQuery at once. (There was only one canceled order :) )

Additionally, you can monitor all the system architecture and see the flow between all the services and resources in a single view with Thundra as shown below.

So when there is a problematic component (service or resource) in your platform, you can pinpoint it in a single glance.

Summary

In this article, we created a simple event-driven architecture that is used to cancel orders for commercial applications. Tracing and monitoring such architectures is hard by its nature because event-driven applications rely on many external resources for many things.

Thundra extended its support for event-driven architectures for Google Cloud Platform by supporting Pub/Sub and BigQuery. You can get started on your own with your free Thundra account and experience a smooth way of monitoring.

You can hit us with a message from our support chat panel at the right bottom of our pages, which redirect messages to us, real humans.

If you are a person who prefers interactive communication, you will be more than welcomed in our Slack community from here: https://www.thundra.io/thundra-slack-invitation

Hope you enjoyed this article. If you'd like to read more about observability, debugging, monitoring microservices workloads in CI/CD, development, or production environments, you should subscribe to our blog.

And lastly, if you still haven't stepped into the Thundra's world, start your journey here.