A high test failure rate doesn't just slow down your CI pipeline—it erodes trust in your test suite, encourages developers to ignore failures, and ultimately lets bugs slip into production. This guide shows you how to measure, diagnose, and improve your test failure rate using data.
What Test Failure Rate Measures
Definition
Test Failure Rate is the percentage of pull requests that have failing status checks (CI failures) at any point during their lifecycle.
Test Failure Rate Formula
Measures the percentage of pull requests that have at least one CI failure during their lifecycle.
Examples:
Why It Matters
Test failures have cascading effects on your delivery velocity and code quality:
- Velocity impact: Each failure adds a cycle of "fix, push, wait for CI" that can take 20-60 minutes
- Context switching: Developers move on to other work while waiting, then must reload context when CI fails
- Trust erosion: If tests fail randomly (flaky tests), developers start ignoring failures or re-running until they pass
- Review delays: Reviewers may wait for green CI before starting review, compounding delays
DORA Connection
Test failure rate directly impacts your DORA metrics:
- Lead Time: Failures add time to every PR
- Change Failure Rate: Poor test coverage lets bugs reach production
- Deployment Frequency: Unreliable CI blocks frequent deployments
🔥 Our Take
A flaky test suite is worse than no tests at all.
When tests fail randomly, developers learn to ignore them. They click "re-run" until things pass, or they merge despite red builds. You've trained your team that tests don't matter. A smaller, reliable test suite is more valuable than a large, unreliable one. Delete flaky tests until you fix them.
"Every 're-run CI' click is a confession that your test suite has lost the team's trust."
Reading the Metric in CodePulse
Dashboard Card
On your Dashboard, find the Test Failure Rate card in the Quality Metrics section:
- Percentage displayed: Current failure rate for the selected time period
- Trend indicator: Arrow showing if rate is improving or worsening
- Color coding: Green (<10%), Yellow (10-20%), Red (>20%)
📊 How to Read This in CodePulse
The Test Failure Rate card shows:
- Current percentage and trend vs previous period
- Filter by repository to find your worst-performing repos
- Compare time periods to see if recent changes helped
Check Awards → "Quality Guardian" to see which developers have the highest test pass rates.
Per-Repository Breakdown
Different repositories often have very different failure rates. Filter by repository to identify:
- Which repos have the worst test failure rates
- Whether specific repos have flaky test problems
- If newer repos have better testing practices than legacy ones
Developer-Level Insights
While we focus on team-level metrics, individual pass rates can be useful for coaching. Developers with notably lower pass rates might benefit from:
- Pairing on writing better tests
- Access to better local testing tools
- Understanding of which tests to run locally before pushing
Common Causes of High Failure Rate
1. Flaky Tests
Tests that pass sometimes and fail sometimes without any code changes. The most frustrating type of failure.
Signs:
- Same test fails on retry without code changes
- "Re-run CI" is a common team behavior
- Failures happen more at certain times (race conditions)
Common causes:
- Race conditions in async tests
- Tests depending on external services
- Time-dependent tests
- Tests with shared state that isn't properly reset
2. Environment Issues
Tests that pass locally but fail in CI due to environment differences.
Signs:
- "Works on my machine" is a frequent phrase
- Failures only happen in CI, not locally
- Different failure patterns between CI runners
Common causes:
- Different dependency versions in CI vs local
- Missing environment variables or configs
- Different OS or architecture between local and CI
- Resource constraints in CI (memory, disk, network)
3. Insufficient Local Testing
Developers pushing code without running tests locally first.
Signs:
- Obvious failures that would have been caught locally
- Multiple fix-up commits after initial push
- Developers saying "CI will catch it"
4. Test Coverage Gaps
Areas of code with poor or no test coverage where bugs accumulate.
Signs:
- Bugs reach production that tests should have caught
- Regression failures when touching "untested" areas
- Low code coverage metrics
Using CodePulse to Identify Patterns
Which Repos Have the Worst Failure Rates?
Filter your dashboard by repository and compare failure rates. Focus improvement efforts on the worst performers first—they'll have the biggest impact.
Trend Analysis
Compare failure rates across time periods:
- Improving trend: Recent infrastructure investments or test cleanup are paying off
- Worsening trend: Technical debt is accumulating; prioritize test reliability
- Spiky pattern: External factors (deploy days, specific features) may be causing intermittent issues
Correlate with Risky Changes
The Risky Changes feature flags PRs with failing checks. Review these to understand:
- Are failures concentrated in certain file types?
- Do large PRs have higher failure rates than small ones?
- Are specific types of changes (e.g., database migrations) failure-prone?
Improvement Strategies
Quarantine Flaky Tests
Don't let flaky tests block the entire team. Implement a quarantine system:
- Identify tests that fail intermittently (track failure patterns)
- Move them to a non-blocking test suite
- Create tickets to fix each quarantined test
- Run quarantined tests separately and track their stability
- Graduate tests back to the main suite once fixed
Pre-Commit Hooks
Catch obvious failures before code is pushed:
- Run linters and formatters automatically
- Run unit tests for changed files
- Type-check in typed languages
- Keep pre-commit fast (<30 seconds) so developers don't skip it
Improve Environment Parity
Make local development match CI as closely as possible:
- Use Docker for consistent environments
- Lock dependency versions in CI and development
- Document required environment setup
- Consider development containers (VS Code devcontainers)
Test Infrastructure Investment
Sometimes you need to invest in better test infrastructure:
- Faster CI runners with more resources
- Better test parallelization
- Test result caching to skip unchanged tests
- Better test data management (fixtures, factories)
Cultural Changes
Technical fixes only go so far. Build a culture that values test reliability:
- Make "green builds" a team norm, not a suggestion
- Celebrate when flaky tests are fixed
- Allocate time for test maintenance (not just feature work)
- Track and celebrate improvement in failure rate
Setting Up Alerts
Don't wait for quarterly reviews to notice test health degrading. Set up proactive alerts:
Alert: Test Failure Rate Warning Metric: test_failure_rate_percent Operator: > Threshold: 15 Time Period: weekly Severity: warning Description: "Weekly test failure rate exceeds 15%" Alert: Test Failure Rate Critical Metric: test_failure_rate_percent Operator: > Threshold: 25 Time Period: weekly Severity: critical Description: "Weekly test failure rate exceeds 25% - immediate attention needed"
What Good Looks Like
Benchmark your failure rate against these targets:
Test Failure Rate Benchmarks
Target thresholdsIf you're above 20%, make test reliability a top priority—it's likely slowing down everything else your team does.
Related Guides
- Reducing PR Cycle Time — test failures are a major cycle time contributor
- Regression Prevention Guide — prevent bugs from reaching production
- Alert Rules Guide — set up proactive quality alerts
See these insights for your team
CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.
Free tier available. No credit card required.
Related Guides
The PR Pattern That Predicts 73% of Your Incidents
Learn how to identify high-risk pull requests before they cause production incidents.
DORA Metrics Are Being Weaponized. Here's the Fix
DORA metrics were designed for research, not management. Learn how to use them correctly as signals for improvement, not targets to game.