Test analytics for GitHub

Track each test's performance across commits to detect regressions.

Separate regressions from flakes

Every result is tied to the commit and environment it ran on. Flakiness.io can tell whether a failure is new in the PR, already broken on main, or flipping on the same commit. Never land a regression again.

Pull RequestsTest Results

Refactor auth middlewareHas Regressions

#1409

Update ClickHouse clientMergeable

#1407

Fix snapshot collisionMergeable

#1406

Vendor isolation policyMergeable

#1402

Test Health Calendarmain

Dec

Jan

Feb

Mar

Mon

Wed

Fri

RegressionFailureFlakyPassingNo data

Test Runners

Any test, any runner

Frontend in Vitest. E2E in Playwright. Backend in Pytest or JUnit.

Flakiness.io gathers all tests in one place, with high-quality reporters for the major runners, a JUnit XML bridge for everything else, and a Node.js SDK for custom integrations.

Bun

Faster CI from balanced shards

A sharded run is only as fast as its slowest shard. Splitting tests by count leaves some shards overloaded and others idle, so the whole run waits on a straggler.

Flakiness.io records how long every test takes, and compatible test runners can use that information to balance shards by real duration.

Random shardingwall-clock 20m 00s

shard 1/4

16m 00s

shard 2/4

20m 00s

shard 3/4

9m 00s

shard 4/4

15m 00s

Split by test count

Balanced shardingwall-clock 15m 01s

shard 1/4

14m 58s

shard 2/4

15m 01s

shard 3/4

15m 00s

shard 4/4

15m 01s

Split by test duration history

Supported runners

PlaywrightMore runners coming

CI & Sharding

Any CI, one report

Works with any CI provider. Flakiness.io ingests results as they land, automatically merges shards, and keeps staging and production histories separate.

GitHub ActionsOIDC authenticated

Shard 1/4312 tests

Shard 2/4308 tests

Shard 3/4315 tests

Shard 4/4311 tests

env: production

Unified Report

1,246

tests

Reports

Zoom in, Zoom out

Test reports range from a week-level overview across every environment down to a single test in a single run.

Slice results with the Flakiness Query Language, group failures into error bins, and read system telemetry alongside the test waterfall.

Test Results

1,246

tests

Passing87%

Failed8%

Flaky5%

Test Waterfall1,246 tests · 4 workers · 2m 48s

CPU

Memory

worker #1

worker #2

worker #3

worker #4

0s30s1m1m 30s2m2m 30s

Errors:TimeoutError × 4AssertionError × 2Flaky × 1

Artifacts

Keep evidence attached

Logs, screenshots, videos and traces upload alongside every report. Powerful built-in viewers handle image diffing and Playwright traces in the browser.

Configurable data retention lets you keep what matters and prune the rest.

See a report with attachments

Videos

Images

Playwright Traces

HTML Attachments

Terminal Logs

Access Control

GitHub-native access

Flakiness.io follows your GitHub repository permissions. People with access to a repository can see its test analytics; people without can't.

You do not need to recreate users, teams, permissions, or single sign-on (SSO) inside Flakiness.io. Access stays where your engineering team already manages it: GitHub.

acme/apiCollaborators

alexAdmin

tallyWrite

saraWrite

jamieRead

marcoRead

flakiness.io · acme/apisynced

alexHas access

tallyHas access

saraHas access

jamieHas access

marcoHas access

AI Agents

Compact context for coding agents

Raw CI output is expensive context for coding agents. Flakiness.io turns a failed run into a compact record: what failed, whether it regressed, how it behaved on main, and which logs and artifacts matter.

Agents spend fewer tokens gathering context and less time fixing problems.

Claude Code

Codex

Cursor

claude

> Hey Claude, fix failures in PR #219

● Pulling test history from flakiness.io…

● 3 regressions found in billing/*.spec.ts

● Analyzing flip rate on main…

● Drafting fix for race condition in billing.ts:142

Pricing Model

We charge for storage,
not tests or seats

Most platforms charge per test run, per dashboard user, or both. Flakiness.io charges for stored data. Run more tests and let everyone with GitHub access view the results — without per-run or per-seat fees.

Typical providers

$ / test run

$ / user

× Penalty for writing more tests

× Surprise bills at scale

Flakiness.io

$ / storage

✓ Run as many tests as you want

✓ Predictable bills at scale

Platform pricing

Choose your plan

Starts free — no credit card required.

MonthlyYearly

Free

$0USD

/ month

no credit card required

Create free organization

Core test analytics
1 GB artifacts storage
90 days data retention
Unlimited test runs
Unlimited users
Public projects
Private projects
Slack Notifications
Self-hosting
Standard support

Startup

$49USD

/ month

billed monthly

Continue with Startup

Core test analytics
10 GB artifacts storage
180 days data retention
Unlimited test runs
Unlimited users
Public projects
Private projects
Slack Notifications
Self-hosting
Standard support

Recommended

Growth

$199USD

/ month

billed monthly

Continue with Growth

Core test analytics
100 GB artifacts storage
365 days data retention
Unlimited test runs
Unlimited users
Public projects
Private projects
Slack Notifications
Self-hosting
Priority support

Enterprise

Custom

contract pricing

Contact sales

Core test analytics
Custom artifacts storage
Custom data retention
Unlimited test runs
Unlimited users
Public projects
Private projects
Slack Notifications
Self-hosting
Priority support

Frequently asked questions

Why is Flakiness.io cheaper than other platforms?

Flakiness.io charges for stored data, not per user or test run. The analytics engine is built on interval unions, which keeps large test histories efficient to pack and process at scale. Pricing follows that architecture: storage drives cost, not execution count.

How does Flakiness.io detect flakiness?

Every test result is tied to the commit and environment it ran on. A test that flips between pass and fail on the same commit is classified as a flake; a test whose outcome changes at a specific commit and stays that way is a regression. This separates noise from real breakage without rerunning anything.

The commit-aware analysis behind this is grounded in research published by Apple engineers.

Can Flakiness.io handle our massive monorepo?

Yes. Flakiness.io is built for large test histories, mixed stacks, and high test volume. We have tested the system on established projects with 500,000+ tests, and it handled the volume without issue.

Can I host Flakiness.io on my own servers?

Yes. A self-hosted deployment is one application container backed by PostgreSQL and an S3-compatible object store. The reporters and the Flakiness CLI work with custom deployments. See the self-hosting docs or contact [email protected] for licensing.

Who develops the test runner integrations?

The Flakiness JSON Report format is open source, so anyone can build a reporter.

The Flakiness.io team maintains the official reporters for the major test runners, the JUnit bridge, and the platform itself. See the test runners documentation for the full list.

More questions? Reach out at [email protected]

Who's behind Flakiness.io

A decade of building dev tools

Flakiness.io is created by Andrey Lushnikov, who spent over a decade at Google and Microsoft building the tools that testing engineers rely on every day: creating Google Puppeteer and co-founding Microsoft Playwright.

Andrey Lushnikov

Test analytics for GitHub

Track each test's performance across commits to detect regressions.

Separate regressions from flakes

Any test, any runner

Faster CI from balanced shards

Any CI, one report

Zoom in, Zoom out

Keep evidence attached

GitHub-native access

Compact context for coding agents

We charge for storage,not tests or seats

Choose your plan

Free

Startup

Growth

Enterprise

Frequently asked questions

Why is Flakiness.io cheaper than other platforms?

How does Flakiness.io detect flakiness?

Can Flakiness.io handle our massive monorepo?

Can I host Flakiness.io on my own servers?

Who develops the test runner integrations?

A decade of building dev tools

We charge for storage,
not tests or seats