If you work in modern software development, you probably deploy something to the cloud. These cloud deployments often lead to adopting a microservice architecture, because it enables developers to integrate cloud services more efficiently and, in turn, save time and money.
That said, even if you don’t use such an architecture, your system still calls for many third-party services. And the distributed services that make up your application aren’t always easy to keep track of. You can end up with errors spread across multiple services with performance bottlenecks buried deep in core services.
In the best-case scenario, you’ll find a bug in the first file you look at. In the worst case, you’ll find out that the problem was a third-party service after spending hours or even days needlessly debugging your codebases.
The majority of situations fall somewhere in the middle. You’ll have to go through a good part of your code to find out what’s going on, and depending on the size of the codebase, even the “average” case might take too long.
This is why monitoring in the form of logs and traces exists. These tools give you insights so you can isolate the location of an error and search the code more efficiently.
This article will illustrate the differences between logs and traces. We’ll show how they’re related and what benefits coupling distributed tracing with logging can provide. So, if you want to fix bugs fast, read on!
1. Distributed Tracing Builds on Logging
The first thing to note is that tracing is usually built on logging. Logging is a base technology for writing information about a system in the form of loglines on permanent storage. Depending on how much thought went into your design process before implementing logging, these loglines might be used to implement some form of distributed tracing on top of them.
A trace is built from one or more log entries correlated by an ID. If this ID is missing, tracing can’t be implemented on the logs. For example, if you have an API that receives requests and does some work related to this request, you could use the request ID and graft it on every logline triggered by that request.
2. Logs Are Often Already Available
Most services that we use come with built-in logging functionality, so people tend to use it without any thought. A distributed tracing system might take some time to set up on top of these logs, so many developers simply stick to logs because deadlines are always looming.
But in the long run, the features provided by distributed tracing could save you days or weeks of searching for problems. What’s more, the up-front investment may be just a few hours or even minutes of work. Modern monitoring platforms like Thundra APM offer a streamlined process for implementing distributed tracing.
3. Distributed Tracing Links Your Architecture Together
Initially, having a central place to check on your system as a whole may not seem like a big deal. But as your system grows, having such a resource can save you time and money. Three systems aren’t hard to keep track of, but dozens might be.
You can consolidate all of your logging efforts so you can search one central location, but distributed tracing goes one step further. It links all the loglines in your system together in a logical way, so you know precisely what route an event took through your whole system.
When things go wrong, this is what a developer has to do anyway, so why not automate it right from the start? This way it only has to be done once, and the whole team will benefit from it for the lifetime of the application.
4. Distributed Tracing Requires More Structure
Distributed tracing requires more structure and more standardization around your loglines. This way, a tracing service like Thundra APM knows where to look for the service name, the task a logline corresponds to, or the user ID. So, keeping things cleanly structured right from the start is a must.
When all your logs are structured similarly, they’re also easier for developers to understand. They don’t have to learn a new log format for every service they debug, so they can quickly switch between services and reuse the skills they gained while debugging the first one.
The later in the development process you implement this, the harder it gets, so one of the first things to do for a new application is logging and tracing. If you don’t have the time or don’t know how to define the structure on your own, services like Thundra APM can make the process painless.
5. Distributed Tracing Is Easier to Read
The graphic visualization of loglines is somewhat limited, especially if they’re unstructured. But with structured logging and distributed tracing on top, we can see how the services in a system work together.
Was one call part of another? How much time was spent waiting for the database? Have all steps of a task been completed? Did these steps complete in an acceptable time?
All of these questions can be answered with one glance at a trace.
If a system is monitored using only unstructured logging, correlation work has to be performed repeatedly for every debugged issue. This leads to endless duplication of manual tasks.
6. Distributed Tracing Is Easier to Automate
The more structured system data is, the more accessible it is for other systems. Accessibility, in this context, means it can be subjected to more automation more efficiently.
Structured data can be filtered better at the source and reduces traffic on other services that work with that data. Getting a notification if a task exceeds a specific runtime is much harder to accomplish just with logs.
Such systems can target team members to notify them of any problems that are occurring right now. But they can also be used by other systems to take automated action. A service slows down because it gets too much traffic at once? This event could call out to the cloud API so it can trigger new instances of application servers to match demand.
Security remediation may be another use case. If a service is the target of a distributed denial-of-service attack, a system could be notified and route malicious traffic to dead ends.
Logging is a more rudimentary way of monitoring than distributed tracing is. Logging requires less work to get going up front, and the cloud services you use typically come with built-in logging functionality.
Distributed tracing is usually built on top of a structured logging system, and it can save you a lot of time in locating and fixing issues because it makes monitoring data easier to understand. The structured logs that form the base of distributed tracing help to further automate a system by filtering the monitoring data for more detailed alerting and notifications.
In the past, distributed tracing was a challenge to set up, but with Thundra APM, you can get going in minutes.