Skip to main content
CodePulse
All Guides
Metrics

The DORA Metric Everyone Ignores (Until Production Breaks)

Learn how to measure Change Failure Rate and Mean Time to Restore using GitHub data, even without full incident tracking integration.

14 min readUpdated March 26, 2026By CodePulse Team

Deployment frequency tells you how fast you ship. Change failure rate and mean time to restore tell you whether that speed is sustainable. This guide covers how to measure both stability metrics using GitHub data, what the benchmarks actually mean, and how to improve them without adding gates that slow your team down.

Quick Answer

What are change failure rate and mean time to restore?

Change failure rate (CFR) is the percentage of deployments that cause production incidents, rollbacks, or degraded service. Mean time to restore (MTTR) is how quickly you recover when failures happen. According to the 2024 DORA report, elite teams maintain CFR below 15% while restoring service in under one hour. CodePulse tracks both metrics automatically by detecting revert patterns, hotfix branches, and rapid follow-up fixes in your GitHub data.

Most engineering leaders track deployment frequency and lead time because they are easy to measure. The stability half of DORA, CFR and MTTR, gets less attention because it requires connecting code changes to production outcomes. That connection is harder without incident management tooling. But GitHub data alone provides enough signal to start measuring both metrics today.

According to the 2024 DORA State of DevOps Report, the teams that ship the fastest also break the least. That finding challenges the common assumption that speed and stability are trade-offs. They are not. The same practices that enable frequent deployment, small batches, good testing, automation, also prevent failures.

"If your team is choosing between shipping fast and shipping safely, you have an engineering problem, not a prioritization problem."

What do change failure rate and MTTR actually measure?

Change failure rate captures the percentage of deployments that result in degraded service and require remediation. MTTR (which the 2024 DORA report renamed to "failed deployment recovery time") measures how quickly your team restores service after a failure.

Together they form your deployment risk profile. Low CFR means you rarely break things. Low MTTR means when you do, the impact is contained. A team with 5% CFR and 30-minute MTTR is in a fundamentally different position than a team with 5% CFR and 3-day MTTR, even though the failure rates are identical.

Performance LevelChange Failure RateRecovery Time
Elite0-15%Less than 1 hour
High16-30%Less than 1 day
Medium16-30%1 day to 1 week
Low16-30%More than 6 months

Notice that medium and low performers have similar CFR but differ dramatically in recovery time. The 2024 DORA data suggests that what separates high performers from low performers is not how often things break, but how fast they recover.

For a broader view of all four DORA metrics, see our complete DORA metrics guide.

How do you define "failure" for your organization?

Before you can measure CFR, you need to decide what counts as a failure. This is where most teams get stuck. The answer depends on your risk tolerance and industry context.

Narrow definition: Only customer-facing outages where the product is completely unusable. An e-commerce platform might count this as orders cannot be placed or payments fail.

Moderate definition: Any incident requiring intervention, including rollbacks, hotfixes, and manual remediation. A SaaS product might count any severity 1-2 incident or any revert.

Broad definition: Any deployment that did not work as expected, including reverts for non-critical issues and follow-up fixes. Internal tooling teams often use this definition because the cost of false positives is low.

"Pick a failure definition and stick with it for at least six months. The trend matters more than the absolute number, and changing definitions destroys your trend line."

The right definition depends on your situation, but consistency matters more than precision. If you track the same definition over time, you can measure improvement. If you keep changing definitions, the trend becomes meaningless.

Detect code hotspots and knowledge silos with CodePulse

How do you measure change failure rate from GitHub data?

Without integrated incident tracking, GitHub activity patterns provide three reliable failure signals.

Signal 1: Revert commits

The strongest failure signal. A revert is an explicit acknowledgment that a change needed to be undone. Detection patterns include PR titles containing "revert" or "rollback", commit messages starting with "Revert ", and GitHub auto-generated revert PRs.

Signal 2: Hotfix branches

If your team uses hotfix/* branches for emergency fixes, these indicate something broke in production. Track merges from hotfix branches, count PRs with "hotfix" labels, and identify PRs merged outside normal review process.

Signal 3: Rapid follow-up fixes

When a PR is merged and then another PR touching the same files is merged within 24-48 hours, it may indicate the original change needed fixing. This signal is noisier than reverts or hotfixes, so weight it at roughly 50% compared to a confirmed revert.

CFR Calculation:

  Basic:    CFR = (Revert PRs) / (Total PRs merged to main)
  Weighted: CFR = (Reverts*1.0 + Hotfixes*0.9 + Rapid-fixes*0.5) / Total deploys

  Example (monthly):
    100 PRs merged
    3 reverts + 2 hotfixes + 4 rapid follow-ups
    Basic CFR  = 3/100 = 3%
    Weighted CFR = (3 + 1.8 + 2) / 100 = 6.8%

📊 How to Track CFR in CodePulse

CodePulse automatically detects all three failure signals from your GitHub data:

  • Navigate to Dashboard to view CFR trends alongside deployment frequency
  • Revert detection from PR patterns and commit messages
  • Hotfix identification from branch naming conventions
  • Set up alert rules to get notified when CFR exceeds your threshold

How do you track MTTR without full observability tooling?

Mean time to restore captures how quickly you recover from failures. In a full observability setup, you measure from incident detection to resolution. With GitHub data alone, you measure engineering response time: how quickly your team produces and ships a fix.

Three calculation methods

Method 1, revert time: Start at the original PR merge time. End at the revert PR merge time. This captures how quickly the team identified the problem and rolled back.

Method 2, hotfix cycle time: Start at hotfix branch creation. End at hotfix PR merge. This captures how quickly the team wrote and shipped a targeted fix.

Method 3, issue-to-fix: Start at the bug or incident issue creation. End at the fixing PR merge. This requires linked issues but provides the most complete picture.

MTTR Aggregation:

  Use median, not mean. Outliers (3-day recovery from one bad deploy)
  will skew the average and mask your typical performance.

  Example (quarterly):
    Incident 1: 45 minutes (revert)
    Incident 2: 2 hours (hotfix)
    Incident 3: 30 minutes (revert)
    Incident 4: 18 hours (complex fix)
    Incident 5: 1 hour (hotfix)

    Mean MTTR:   4.45 hours (skewed by incident 4)
    Median MTTR: 1 hour (better representation)

What GitHub-based MTTR misses

This approach has blind spots. You cannot measure detection time (when the failure was discovered, only when the fix was merged). You miss deployment lag between merge and actual release. And you miss incidents resolved without code changes, like configuration rollbacks through deployment tools. Despite these gaps, engineering response time is still valuable. It tracks the part you can directly improve through better processes and tooling.

Identify bottlenecks slowing your team with CodePulse

How do you improve CFR without slowing down deployment frequency?

High CFR tempts teams to add approval gates and manual QA steps. This is the wrong response. The DORA research consistently shows that elite teams achieve low CFR while maintaining high deployment frequency. They do it through prevention, detection, and recovery, not gates.

🔥 Our Take

DORA metrics were designed to study thousands of organizations, not manage your specific team. Using CFR as a KPI that engineers are evaluated against will lead to gaming: teams will stop reverting (hiding failures) rather than actually preventing them.

Track CFR as a signal, not a target. If your CFR goes up, investigate why. If it goes down, understand what changed. The goal is learning, not leaderboard positions. When a measure becomes a target, it ceases to be a good measure.

Prevention: catch issues before production

  • Integration tests over unit tests: Focus testing effort on tests that exercise real integration points. A 95% unit test coverage number means nothing if your API contract tests are missing.
  • Review for correctness, not style: Code review should focus on logic errors and edge cases. Style enforcement belongs in linters. See our code review culture guide for review quality strategies.
  • Smaller changes: According to a LinearB analysis of 2 million PRs, PRs over 400 lines are 3x more likely to be rejected. Small PRs are easier to review thoroughly and easier to roll back when something goes wrong.

Detection: find issues quickly

  • Staged rollouts: Canary deployments that limit blast radius to a small percentage of traffic
  • Feature flags: Ability to disable problematic features without rolling back the entire deployment
  • Monitoring alerts: Automated anomaly detection on error rates, latency, and key business metrics

Recovery: fix issues fast

  • One-click rollbacks: Automated ability to revert to the last known good state in under 5 minutes
  • Hotfix fast-path: Expedited review process for critical production fixes
  • Runbooks: Documented response procedures so recovery does not depend on specific individuals being available

"You do not get sub-one-hour MTTR by accident. You get it because your engineers already built automated rollback procedures and designed systems for instant diagnostics."

What hurts velocity without helping CFRWhat actually works
Adding more approval gatesInvest in automated testing
Requiring more manual QAImprove integration test coverage
Batching changes into big releasesShip smaller changes more frequently
Blaming developers for failuresBuild fast recovery capabilities

What does the 2024 DORA report change about stability metrics?

The 2024 DORA report made several significant changes to how stability metrics work.

MTTR renamed: "Mean time to restore" is now "failed deployment recovery time." More importantly, it moved from the stability cluster to throughput, reflecting the finding that fast recovery enables faster deployment rather than just preventing damage.

Performance tiers replaced: The familiar low/medium/high/elite classification has been replaced with seven team archetypes. This acknowledges that engineering teams have diverse profiles and a single linear ranking oversimplifies reality.

Fifth metric added: Rework rate joins the framework, measuring the percentage of changes that are later modified or reverted within a short window. According to the CD Foundation, this metric captures quality issues that CFR alone misses, things that do not cause incidents but still require rework.

These changes matter because teams that built dashboards around the four classic DORA metrics with elite/high/medium/low tiers need to update their mental models. The benchmarks in the table above still hold as general guidance, but the official framework is now more nuanced. See our DORA implementation guide for how to adapt your tracking.

For hands-on benchmarking, try our DORA metrics calculator to see where your team stands against industry data.

🔥 Our Take

The move from four tiers to seven archetypes is the most important change in the 2024 report. It validates what we have seen across hundreds of teams: there is no single "right" profile for high performance.

A platform team deploying infrastructure changes has fundamentally different stability requirements than a product team shipping user-facing features. Comparing their CFR numbers is meaningless. Focus on your own trend, not someone else's benchmark.

For related reading on the testing and quality side, check out our test failure rate guide and risky deployments detection guide.

Frequently Asked Questions

According to the DORA research, elite teams maintain a change failure rate between 0% and 15%. High performers fall in the 16-30% range. The key insight is that elite teams achieve low failure rates while also deploying more frequently, not by slowing down. Track your CFR consistently over time rather than chasing a specific number.

See these insights for your team

CodePulse connects to your GitHub and shows you actionable engineering metrics in minutes. No complex setup required.

Free tier available. No credit card required.