When developers are focused on adding software features, the incentives are clear: New and better features bring in new customers, and that drives revenue.
Where does that leave quality assurance (QA)? It’s just an annoying step to be taken to fix a few bugs before a new feature can be released.
Flaky tests can only be ignored for so long. As systems grow, the likelihood of flaky tests increases, until the entire test suite becomes unreliable. Unresolved flaky tests add up and become a serious concern for software companies that rely on distributed cloud computing, and serverless technology.
Your team will lose trust in the test suite if the difference becomes unclear between a configured test timeout, a slow network, or a basic coding error. Technical and core business problems follow. In this post, we discuss the implications of flaky tests on your team’s development process and some of the steps needed to solve the root causes.
Understanding Flaky Tests
Researchers from Mozilla paired up with the University of Zürich in 2019 to assess the underlying causes of flaky tests and to better understand how they impact development and test teams. Using surveys and interviews, the researchers asked developers to categorize flaky tests according to their root causes.
They came up with 11 total types:
- Asynch wait: A system performs asynchronous tasks but fails to wait for them to finish.
- Concurrency: Threads in multi-threaded software are reliant upon implicit data ordering, resulting in race conditions.
- Float precision: Critical float underflow or overflow results are not considered.
- Platform dependency: Tests rely on behavior that’s specific to a given platform. This can include indicating a non-deterministic result on one operating system and a deterministic result on another.
- Randomness: Random numbers are required for testing, but edge cases may be overlooked by developers.
- Range too restrictive: Outputs fall outside of a predefined range, while still being recorded as valid results.
- Resource leak: Improper release triggers memory overflows.
- Test case timeout: Failing to increase timeout even though the test size grew over time.
- Test suite timeout: Although no single test accounts for the flakiness of the results, an aggregate of tests causes a timeout for the full test suite.
- Test order dependency: One test subsequently determines the outcome of the next test.
- Time: A test that relies on the local system time is flaky because, for example, it may compare timestamps in two different time zones.
How Trust Is Undermined by Flaky Tests
Because flaky tests create inconsistent test results, developers often lose faith in the tests. In the Understanding Flaky Tests: the Developer’s Perspective Mozilla study discussed above, researchers found that developers simply don’t trust the test results. And as they begin to see test output as less reliable, developers may even disregard it altogether, which could in fact lead them to ignore actual failures.
Writing test code can double development effort, and if the effort is perceived as having minimal value, motivation drops off. Those existing doubts are deepened by flaky, random results in failures.
If tests are disregarded, however, the situation worsens because a critical QA step is left out. Developers then feel they can’t trust their own code and spend more time manually testing. If that causes the software quality to drop or production to slow, other teams can lose trust in the developer’s products. This erodes company culture.
But things don’t have to go this way. Developers usually see the importance of fixing tests, even though they themselves lack the time to fix them. Often, they ask for more QA resources. However, if leadership priorities are fixed on new features instead, developers begin to lose faith in management as well.
Flaky Tests Slow Development and Drive Up Costs
Some 77% of developers interviewed in the University of Zürich/Mozilla study commented that fixing tests was time-consuming due to the challenge and even sometimes the impossibility of reproducing the test failure.
The study further pointed out that because flaky tests are frequently combined with other tests, it requires sufficient knowledge to fix them, and this, in turn, could result in intermittent tests causing issues with the allocation of the resources available.
One time-consuming but simple workaround is running the test suite a number of times. Or, you can investigate and address the root cause, but not every developer has the time or experience needed for this. It requires a skilled member of the team to devote resources that could otherwise be used to create new features.
Ignoring Flaky Tests in Order to Ship Faster
Software development cycles are faster than ever, and developers who lack incentives to fully incorporate QA steps are likely to take shortcuts. Those shortcuts lead to problems that pile up.
One of the most common shortcuts is to ignore flaky tests and simply rerun the test suite until it produces the desired result. This might help ship new features faster, but if features keep being added while testing problems are ignored, what can we expect?
More flaky tests lead to more test suite reruns, leading to a slowdown in the release of each new feature. Worse, this could result in so many flaky tests that the test suite no longer produces a success, regardless of how many times you run it. That brings continuous integration to a grinding stop.
This type of event is a catastrophic outcome because now you have the highest number of accumulated flaky tests, and the most urgent need to eliminate them.
Your developers can either build new components, or take the time to fix flaky tests, but rarely both. This brings your company into a lose-lose situation. Either devote resources to ship the best new features, while risking customer loss when the software is unstable, or get bogged down fixing notoriously difficult flaky tests.
In the end, flaky tests erode trust within your company if they are ignored too long, and the vicious cycle is likely to make your developers think about switching companies to escape it.
But there are ways to avoid flaky tests from the get-go. And if you’re already dealing with flaky tests, you can course-correct before the situation gets out of hand.
Thundra Sidekick allows troubleshooting your tests with unobtrusive debugging and lets developers determine whether their tests are flaky by setting breakpoints. Get started with Thundra today.