Testing is a crucial part of software development. Whether you’re programming a spaceship or a game, bugs in your code can be (or at least feel) disastrous.
But it’s not enough to minimize bugs in the product, you also need to move fast. If your company can’t keep up with new releases, you risk losing customers to the competition. Everyone knows that the CI/CD pipeline is critical to getting releases out quickly, but too often automated tests fail in the pipeline.
In this article, we’ll help you understand why your tests are failing, how to find failures faster, and offer suggestions for keeping your tests running smoothly.
Reasons for Failing Tests
There are a number of reasons why tests fail, ranging from the code itself to issues that may be outside your purview. Here’s a list of things to keep an eye out for.
One reason tests fail is that they depend on uncontrolled contexts. For instance, integration tests could fail due to a lost connection with the database. Or perhaps you’re using the same environment for both testing and development; in that case, the developers may push new code without tests, causing the test environment to collapse.
This is most likely to happen in a startup product when everyone is motivated to move fast, and creating a dedicated testing environment may seem too time-consuming. But if your enterprise doesn’t take tests seriously, or cuts corners by running tests on slow machines, your tests will be flaky and vulnerable due to timeout errors or browser crashes.
Another uncontrolled context issue is testing credential changes. If you share credentials with developers who update them without notice, that could cause test failures. The best way to deal with this is to only use test credentials for testing. To deter people from deleting or updating credentials, consider instituting penalties.
You may also experience issues with your tests if your operating system automatically updates libraries or if you’re testing functionality that you’re uncertain about. For example, you may not know when a certain result will be visible to the user.
Your Tests Are Flaky
Sometimes the tests themselves are flaky. For example, an automation tester might use a hard timeout with Selenium when they’re waiting for a web element to appear, like this:
WebDriverWait w = new WebDriverWait(driver,3); w.until(ExpectedConditions.presenceOfElementLocated (By.cssSelector("h1")));
But if some element needs to wait for more than three seconds, this can easily cause your test to fail.
Alternatively, your tests might assert something that doesn’t always return the predicted value. As a tester, you always want to cover more edge cases, but there are cases where we shouldn’t assert at all. For example, you might assert an updated time, but your service returns the result 2–3 seconds slower. If you need to wait for an element to appear, it’d be better to use explicit wait. Some automation frameworks now support waiting explicitly, like Playwright, Microsoft’s open-source end-to-end (E2E) testing.
Issues can also arise if QA sets up tests to depend on each other. For example, the first test is for login, the next is for the password, and the last test is to sign in after changing the password. This might save some time, but if you change the order of your tests, they’ll fail. Furthermore, if you have a large number of tests, you might not even know where the problem is coming from.
Ideally, tests are set up to be isolated from each other, that way you can run your tests concurrently and speed up the test execution time. PCs are so strong now, most of them can run parallel tests for at least eight threads.
Today, most programming languages have package management tools. If your habit is to use the latest version of a library, you might setup the configuration files with an “*” to indicate which version you’re using. But employing a new version can cause a crash or conflict between different versions of libraries in use. This can be prevented by setting a static version of your library and making sure your library only updates manually.
To help you identify flaky tests, Thundra Foresight offers detailed reports that show which tests are slow and why they failed. This takes a lot of the guesswork out of troubleshooting your tests so you can get right to fixing them.
Let’s say you updated code for AWS Java SDK and Foresight automatically triggers the tests. Figure 1 below shows the detailed report:
Figure 1: AWS Java SDK test report
The slowest tests, the slowest test suites, and the most erroneous test suites are grouped so you can easily focus on and fix issues. If you click on the test, you can see the details for each step and immediately find out which step needs to be fixed or improved.
Figure 2: Detail information for each step in the test
The framework for your testing is also important. In the past, testers have used tools like Sikuli, which recognizes elements of a software user interface via images, to automate their testing. These tools may work on the first few tests for your software, but if your tests grow or run on different machines, things will fall apart because the image quality might change over time.
Similarly, when we use Selenium or Cypress to work with web elements, we use a CSS selector or xpath that will very likely be changed in the future. Instead, try to use id, name, or data-id for more stable element matching. Discuss these changes with your frontend developers so they can add these to the elements in the HTML page.
If there are multiple suitable frameworks for the task, choose the one that your team is most familiar with. That way your team can move faster and troubleshoot problems when they inevitably arise.
Another common issue is trying to retrofit manual tests into automated ones. That sounds reasonable, but the cost and effort required to write and maintain such a test is extremely high because manual tests are not designed to be automated. That effort and energy would be better spent on tests that will be easier to create and more helpful in finding bugs.
A lot of teams strive to write many automated tests for the E2E testing level, leading to tests that are hard to maintain and that take a lot of time to execute. The practical test pyramid suggests that low-level tests, like unit tests or components, are some of the most important. Even though you might go slower in the first few sprints of development, using these tests as a foundation will help you in the long run.
Traditionally, production teams have seen automated testing as a way to lighten their manual workload. But things are changing fast, and developers increasingly find their code is hard to maintain or refactor. In order to make the testing code maintainable, we need to treat it like production code and follow design patterns that make it easy to understand and work with.
The design patterns we typically apply are Singleton or Page objects. But be careful not to make your design too abstract or complex. Start simple, and when there’s a need for change, don’t hesitate to refactor your code. Write unit tests for the utilities that you use often, that way you can make sure they behave the exact way you intend.
As Martin Fowler writes in Refactoring: Improving the Design of Existing Code, “You should not put off refactoring because you haven’t got time... Not having enough time usually is a sign that you need to do some refactoring.”
Developers Update the Code
Apart from the code generating failed tests, developers sometimes cause the tests to fail. This may be because developers lack the time or know-how to write unit tests. But there are big costs associated with neglecting unit tests for the product code, and a simple change can cause failure in the integration or E2E tests.
In addition to unit tests, when a development team needs to collaborate with different teams for working features, it’s important to have contract tests. Contract tests give you early feedback for what might happen when you integrate, and they help prevent test failures in the integration environment.
In short, there are a lot of reasons that tests fail. But there are many things you can do to minimize those failures in the future.
Failing tests are unavoidable. That said, we should strive to shift our resources from investigating why a test failed to fixing the problem. Implementing meaningful automated tests and making them stable requires hard work and collaboration among the entire development team, but releasing a high-quality product on time makes the effort worthwhile.
Today, we have the ability to debug tests directly in the CI pipeline and to keep those issues out of our local environments. With the help of observability tools, you can find problems quickly and easily. With built-in traces, metrics, and an easy-to-use dashboard, Thundra Foresight makes identifying problematic tests a breeze, so you don’t have to be a sysadmin guy to integrate your tests.
Gain insight into your CI pipeline with Thundra Foresight so you can find issues as they arise—and tackle them before they impact user experience.