Shift-Left: Moving Observability to Your Development Process
For the last two decades, companies around the world have been automating their software development processes. Today, it’s considered an industry best practice to have a fully automated CI/CD pipeline that converts any commit to a new functional production deployment.
Even production environments have experienced heavy automation. When a code is merged into the main branch, the CI process will build the application and run tests. If that succeeds, the CD process deploys the application to production. But not all at once—a new release is often rolled out gradually, and a monitoring system performs multi-dimensional analysis to make sure everything is stable.
Only when no errors occur can the CD process roll out the new version on the entire production environment. If something goes wrong, a rollback is performed to get back to a previously working state as quickly as possible.
These production environment improvements have been a primary focus in recent years, but we can still do better. Even if a new release is only rolled out to 10% of users, that still means some users are potentially affected by bugs. This is the inspiration behind the shift-left movement.
What Is the Shift Left Movement
Traditionally, and even with modern automation, the focus of quality assurance has been on the right side of the software delivery pipeline. After the developers do their work, a QA team or automated tool checks an application for bugs. These methods don’t catch all of the problems, that’s why we follow the procedure described above—partial rollouts, monitoring, and possible rollbacks. As a result, users experienced bugs.
The goal of shifting left is to move as much of the automation as we can as close to the beginning of the production process as possible. The earlier a change is made, the cheaper it is, and the more insulated customers are from any resulting problems.
If we want to shift left and apply our production experience to development, that means starting at the beginning of the pipeline with committing to the main branch. The process of committing to the main branch can be seen as the development equivalent of deploying to production.
Why Keep the Main Branch Clean
The main branch, also known as the master branch, is the bottleneck of our development pipeline. Developers need their commits to land in main if they want to get deployed to production at all. If a broken commit prevents the main branch from successfully going through the CI process, every developer is blocked.
This can delay feature rollouts and, in turn, slow down business value creation and result in loss of revenue. Breaking your own development environment is frustrating, but being subject to the bugs of dozens or hundreds of other developers is untenable.
A broken main branch also hurts developer productivity because every developer that clones the main branch will end up with a broken version that they have to fix themselves—and fixing bugs created by other people often takes much longer than fixing your own. Even if the responsible developer pushes their own fix, everyone has to roll back their local version to apply it.
Depending on the size of the engineering team, all of this duplicated work and the subsequent rollbacks are a huge waste of resources.
Why it’s so Hard to Keep the Main Branch Clean
A broken main branch should be treated as a production outage—it should be the highest priority of the person who broke it.
If you’re working on a big, monolithic application, you can try to add a submit queue, as Uber did. This queue will check the new code with a speculative engine to determine if it would break the main branch. While this can help with big systems, such a queue is a huge investment that comes with a lot of upfront work.
There are more accessible steps you can take to address a broken main branch, though. For instance, if the build fails, the developers that wrote the commits need to be notified so they know they have to act. It doesn’t help to notify a dedicated operations person who doesn’t know what code caused the problem.
It’s also a good practice to include logs from the build process so the developer knows where to start. No one likes to be told that something is wrong without a clear idea of how to solve it.
This usually leads a developer to try to reproduce the bug on their local development environment. This causes its own issues since a developer’s machine is usually one computer and not a distributed cloud system with server locations all over the planet like the CI/CD pipeline. We’ve all heard the phrase, “It works on my machine.” But that doesn’t cut it in the cloud.
To get a better insight into what was going on in the pipeline, many development teams are inclined to use their production monitoring tools on their CI/CD environment. This comes with its own problems. These monitoring tools are tailored to long-running systems handling transactions, not short-lived, fast-changing CI/CD processes.
How to Prevent the Main Branch from Breaking
There are practical measures you can take in confronting a broken main branch. First, instrument your tests and backend services so you get logs, metrics, and traces. These should tell you what services interacted with each other, how long a given part of the CI/CD pipeline took, and what the system’s state was before it failed.
It’s also important to take the time to find flaky tests. Set up notifications for when your average test duration gets out of hand, so you can find performance regressions or bugs that haven’t even happened yet.
These helpful practices are made easy with an observability tool that lets you see what’s going on in your pipeline.
Foresight, Thundra’s new tool for software delivery observability, was built with the spirit of the shift-left movement in mind. It allows you to debug unit tests, integration tests, and E2E tests in a central location, so you can identify bugs and flaky tests without having to comb through multiple service logs. There’s no need to waste time and money in futile attempts to replicate a remote environment locally.
These steps will allow you to debug the actual system responsible for running your test and not just a local estimation of it.
In this article, we looked at the shift-left movement and how it brings the gains of production monitoring to the development process. This approach makes your CI/CD pipeline more observable, so you always know what’s going on. We also saw how important it is to keep the main branch working.
Informing your developers about how their commits affect the main branch is as crucial as the observability of your production environment. Only then can you be sure that the new features flow to production with acceptable quality and delivery speed. If you only monitor your production environment, you’re more susceptible to rollbacks because of errors you haven’t noticed in your pipeline.
Thundra Foresight is a powerful new tool that helps you shift left. Make your delivery process more observable with Foresight’s CI/CD pipeline dashboard, and put metrics, logs, and traces at your fingertips.