Tracing for Everyone: Business Flows Simplified

Oct 15, 2019

 

tracing-for-everyone

When we started designing our OpenTracing-compatible distributed tracing engine Otto, we didn’t want to limit it to only AWS services. Therefore, we provided a general infrastructure for linking different invocations connected to each other via traces.

What does this mean? Well, let’s start by defining what a “trace” is: A trace is set of invocations that are connected to each other in a flow.  An invocation might trigger another invocation in one of two ways:

  • synchronously, by calling another service through an API call (e.g., Lambda 1 => API GW => Lambda 2), or
  • asynchronously, by sending an event to a service which triggers another service (e.g., Lambda 1 => SQS => Lambda 2)

Here, these Lambda invocations are connected to each other because they are in the same flow. 

Architecting such kinds of flows, which include multiple invocations connected to each other, is a very common case in modern applications based on single-purpose microservices where many small applications/services are interacting with each other synchronously or asynchronously. We gave many examples of these kinds of interactions through AWS services in our blog post  announcing our distributed tracing support

However as we mentioned in that post, sometimes invocations may not be related with each other physically (synchronously or asynchronously), but logically. For instance, take a blog system in which there is a flow ending with a service that writes a blog post to DynamoDB. After one day, the moderator approves the blog post to be published, which triggers another flow of invocations. In this case, these two flows are linked with each other logically as they are dealing with the same blog post. So, it would be better if the developer could see the entire flow in the same picture. 

As we mentioned before, we have our own distributed tracing engine, Otto, which provides a distributed tracing infrastructure independent from any third-party services, including AWS or any other. So, on top of Otto, you can even link logically related traces yourself in a customizable way.

Serverless Blog Site Application

In this post, we will take you through an example of a blog site application that is 100% serverless.  

Here is the state diagram of the system:1

As shown in the diagram, there are three main flows in the system, each of which we will discuss below. 

Approve Blog Post

The steps for this flow are the following: 

  • Blog post is received from API Gateway.
  • It is validated asynchronously.
  • If it passes the validation:
    • It is saved to a DynamoDB table.
    • And it is replicated to Elasticsearch asynchronously to make the post available for free text searches.
  • A notification is then sent to the author of the blog post about whether or not it has been approved.

2

Review Blog Post

For this flow, the steps are as follows:

  • Blog post is reviewed manually by the editor, and a new version is sent through the API Gateway.
  • It is updated in DynamoDB with the reviewed/updated version. 
  • It is replicated to Elasticsearch asynchronously to make the new post content available for free text searches.
  • A notification is then sent to the author indicating that the blog post has been reviewed.

3

Publish Blog Post

Finally, the steps for a published flow are:

  • The admin decides to publish the blog post, and the publish request is sent through the API Gateway.
  • It is updated in DynamoDB with the reviewed/updated version. 
  • It is replicated to Elasticsearch asynchronously to sync the blog post state with DynamoDB.
  • A notification is then sent to the author of the blog post that the blog post has been published.


4

Time to Glue Together the Traces

Even though the above flows are different, they are in fact logically connected with each other, as they work on the same blog post. They are just handling different states of the blog post. So, it might be better if all of these flows could be shown in the same trace map, as this would make it easier to track all the paths–from sending the blog post for approval up until the publication of the blog post after review. This would, in turn, enable you to pinpoint any problems that arise in what is a very complex and long flow.

As we stated before, Thundra allows for the automated linking of invocations, connected both synchronously and asynchronously, in the same trace (e.g., API GW -> Lambda-1 -> AWS SQS -> Lambda-2 -> ...). But Thundra’s distributed tracing engine also allows you to link logically connected invocations, as in the above example. 

It is very hard and almost infeasible to automatically detect domain-specific logical relations between applications and invocations, as every domain has its own characteristics. At that point, you are expected to link them manually. In this context, Thundra offers up its programmatic distributed tracing API to give flexibility to developers for connecting logically connected applications and invocations.

In our blog site application example, the blog post id and state information (<blog_post_id, state>) can be used as a logical link because the final state of a trace is the source state of the next trace.

Let’s first link the “Approve Blog Post” and “Review Blog Post” traces via the following steps: 

  • Specify the outgoing trace link of the “Approve Blog Post” trace as the combination of the blog post id and final state APPROVED :
thundra.InvocationTraceSupport.addOutgoingTraceLink(blogPost.id + '::' + 'APPROVED');
  • Specify the incoming trace link of the “Review Blog Post” trace as the combination of the blog post id and final state APPROVED:
thundra.InvocationTraceSupport.addIncomingTraceLink(blogPostId + '::' + 'APPROVED');


OK. So we connected the “Approve Blog Post” and “Review Blog Post” traces programmatically by a custom trace link. We also provided the id and the state of the blog post as an identification method between the traces. 


Let’s now connect the “Review Blog Post” and “Publish Blog Post” traces:

  • Specify the outgoing trace link of the “Review Blog Post” trace as the combination of the blog post id and final state REVIEWED:
thundra.InvocationTraceSupport.addOutgoingTraceLink(blogPostId + '::' + 'REVIEWED');
  • Specify the incoming trace link of the “Publish Blog Post” trace as the combination of the blog post id and final state REVIEWED:
thundra.InvocationTraceSupport.addIncomingTraceLink(blogPostId + '::' + 'REVIEWED');

So, we have now linked the “Review Blog Post” and “Publish Blog Post” traces programmatically by a custom trace link, which is generated by using the blog post id and state.

When we put all of these together, we can see the entire flow in a single view in Thundra as shown below:  

5

As you can see, three different flows, which represent the handling of the three different states of a blog post, are shown in the same view: 

  • The “Approve Blog Post” trace is highlighted with red dashes.
  • The “Review Blog Post” trace is highlighted with blue dashes.
  • The “Publish Blog Post” trace is highlighted with black dashes.

So, What’s Next?

When we released our distributed tracing support in April, we were very confident about the capabilities and flexibility of our OpenTracing-compatible distributed tracing engine Otto. Why? Because we believed – and still do – that distributed tracing is the heart of microservice-based architectures. So far, we have received very positive feedback from our customers, who tell us that Otto helps them to track errors back to their origin and reduces troubleshooting time. Additionally, we have added detailed query support on traces as well

But still, this is not enough for us, and we are now currently in the process of improving this feature by enabling automated analyses on traces. This will allow you to understand the behavior of your architecture as well as provide great insights, like what are your traces’ paths and how are they connected to each other. 

Stay tuned for more upcoming features and be on the lookout for how Thundra is going to scale horizontally and vertically until AWS re:Invent 2019.  Just stop by the Thundra booth #627, and let’s talk about serverless, and we can even give you a quick demo of our new features. 

Feel free to simply ping us on Twitter or join our Slack channel to chat anytime. And you can always see a live demo of Thundra in action on our site or sign up to Thundra for a dedicated demo, featuring your company’s data, today.