Detecting Flakiness

flakiness.io uses the following approach to identify flaky tests:

Each test report is tagged with the source code revision (commit) that was tested
The system compares test results across multiple runs of the same commit
If a test both passes and fails for the same commit, it’s marked as “flaky”

Understanding Environment-Specific Failures

While this approach reliably identifies flaky tests, it’s important to understand the distinction between true flakiness and environment-specific failures.

Example: Cross-Platform Testing

Consider a scenario where a test is run on both Windows and Linux:

Test A:
- ✅ Passes on Windows
- ❌ Fails on Linux

This situation can be interpreted in two ways:

True Flakiness: If you’re not specifically testing cross-platform compatibility, this might be considered a flaky test.
Environment-Specific Issue: If you’re intentionally testing cross-platform behavior, the Linux failure should be treated as a legitimate failure, not flakiness.

Using Timelines for Better Analysis

To properly handle environment-specific test results, flakiness.io provides the Timeline feature. Timelines allow you to:

Split test histories by environment
Analyze results separately for different configurations
Identify platform-specific issues

Learn more about how to use this feature in the Timelines documentation.