There’s always something that we can do to repair an old car, right? We all learn what can go wrong and signals we get from it are pretty straightforward about telling the problem (the smoke coming from the engine tells you, you should stop). On the other hand, these newer more digital cars are so complicated that you can’t understand what’s going wrong when they signal about they need to be repaired. When you go to the mechanics, they understand what’s going wrong by plugging a special device into the new black-box car and diagnose the problem. This analogy perfectly applies to modern software architectures and observability.
It was pretty straightforward to understand what’s the problem by checking several metric charts when we examine what can be problematic in monolithic applications. In modern distributed architectures, however, you need to have specialized tools of observability to diagnose the issues. You need to have all three pillars of observability namely traces, metrics, and logs while metrics and logs were enough for monolithic applications. There are many businesses including ours that help developers with the tracing out of the market but we’ll investigate the two prominent open-source solutions for tracing that are Zipkin and Jaeger.
In the virtual world, distributed tracing is used to make a system observable using platforms such as OpenTracing, OpenCensus, and OpenTelemetry. When you adopt one of those open-source standards, you’ll need a tool like Zipkin and Jaeger to process the observability data.
Background of Zipkin and Jaeger
Jaeger, that’s claimed to be cooler, is comparatively new created by Uber engineers and written in Go. Jaeger adds new features such as dynamic sampling, a REST API, a cool user interface on ReaactJS. Jaeger’s approach is a bit different from Zipkin’s with its client that emits traces to an agent. This agent directly interacts with the collector which validates, transforms, and persists spans.
Comparing Zipkin and Jaeger
It’s hard to compare two very cool tools and decide on a winner and we decided not to make this decision but just explain both sides. Let’s start with the bright sides. Zipkin is a more mature platform and has wider community support. Written in Java, it has been selected by many enterprises who are looking for stability over rich functionality. Integrated with OpenTracing, OpenCensus, and OpenTelemetry, Zipkin has a wide range of extensibility and tool integrations.
Jaeger, the younger brother, provides more modern design and architecture which makes it flexible and performant in *very* distributed architectures. Its React-based UI is easily deployable and extendable for custom use cases. Last but not least, it has Cloud Native Computing Foundation support (CNCF); and while this is more of a recommendation than a standard, it should be taken into account.
Let’s look at the downsides of each tool. As the older one, Zipkin uses a less modular and more centralized architecture — which makes it slower and less flexible than its newer rival. While this difference may not matter for smaller systems, as your system starts to grow or needs to quickly scale, it may become an issue.
Zipkin’s core components were written in Java, which is great for any organization that values stability.You may think Jaeger is better because it is newer but that doesn’t necessarily what we mean. In fact, many people — especially in enterprise IT — will look at Jaeger’s relative immaturity as a disadvantage. Jaeger’s choice of Go as its main language illustrates this point. Although the Gophers are extending their community fast they are far from being as common as Java. If you’re not familiar with Go, this can make your learning process longer.
Another area that is both a blessing and a curse for Jaeger is its more modern architecture. This architecture offers benefits in terms of performance, reliability and scalability, but it’s also far more complex and harder to maintain. Jaeger also shares the same ephemeral, in-memory storage issues as Zipkin, plus its API lacks Zipkin support.
Zipkin or Jaeger: A very hard decision
Before we give our recommendations, let's summarize what Zipkin and Jaeger offer and their strengths and weaknesses. In choosing one or the other, you should also take into account your organization’s structure, its monitoring needs, and its in-house technical expertise. In addition, you should decide whether your organization and team prefer newer or more mature technologies.
Booth tools are really good options for collecting and analyzing distributed tracing data. They both have support for OpenTelemetry. They both have a wide range of tool integrations. The problems that might occur due to in-memory storage and eventual data loss can happen in both of them.
If you’re looking for a more mature solution with a wider community and enterprise roots with Java, Zipkin is your way to go. If you are looking for a more performant one with official CNCF support, Jaeger will be the right choice for you.
In any case, the third option should be using an automated solution to generate and visualize the tracing data. With its automated integration to runtimes Java, Node.js, Python, Go and .NET, Thundra provides an easier way of understanding distributed architectures. Before you dive into developing your own way of tracing with open source alternatives, you can give a try to Thundra with its forever free package that hosts 250K traces monthly.