DORA Metrics Are Being Weaponized. Here's the Fix

DORA metrics are everywhere. Every engineering analytics tool promises to make you "elite." Every conference talk treats them as gospel. But here's what nobody tells you: DORA metrics were designed to study thousands of organizations, not manage your specific team. This guide cuts through the hype to show you how to use DORA metrics correctly—as signals for improvement, not targets to game.

"If your team is optimizing for 'elite' status rather than shipping better software, you've already lost."

What DORA Actually Measured (And Why It Matters)

The DORA research program—created by Nicole Forsgren, Jez Humble, and Gene Kim—studied over 30,000 professionals across seven years to understand what separates high-performing software teams from everyone else. Their findings, published in the book Accelerate (2018) and annual State of DevOps reports, identified four metrics that consistently distinguished elite performers.

These metrics matter because they're outcome-based, not activity-based. They measure what actually matters—how quickly and reliably you deliver value to users—rather than vanity metrics like lines of code or commit counts.

🔥 Our Take

DORA metrics were designed for research, not management. Using them as KPIs misses the point entirely.

The original research identified what elite teams do, not how they got there. Goodhart's Law applies: when a measure becomes a target, it ceases to be a good measure. A team that games their way to "elite" deployment frequency while shipping broken code hasn't improved—they've just learned to look good on a dashboard.

What DORA Research Proves (And What It Doesn't)

The research proves:

Speed and stability aren't trade-offs. Elite teams have both high deployment frequency AND low change failure rates. You don't sacrifice quality for velocity.
Continuous delivery practices correlate with better outcomes. Teams that deploy frequently, with good automation and testing, perform better.
These patterns hold across industries. It's not just for startups or tech companies.

The research does NOT prove:

That targeting "elite" benchmarks will make you better. Correlation isn't causation. Elite teams have good metrics because they have good practices—not the other way around.
That these four metrics are sufficient. DORA metrics tell you about delivery performance. They say nothing about code quality, developer experience, or whether you're building the right thing.
That your specific team should aim for specific numbers. The benchmarks are population-level findings. Your context matters. See our guide on how to use engineering benchmarks without gaming them.

CodePulse SPACE Metrics dashboard showing DORA metrics integration with velocity, quality, and collaboration scores — CodePulse SPACE Metrics dashboard with DORA integration

See your engineering metrics in 5 minutes with CodePulse

DORA Four Key Metrics: Deployment Frequency, Lead Time, Change Failure Rate, and MTTR with performance level benchmarks — The four DORA metrics with 2024 performance benchmarks

Free Download: DORA Baseline Calculator — Calculate your team's DORA performance level instantly and compare against industry benchmarks.

The Four Metrics (Without the Hype)

1. Deployment Frequency

What it measures: How often your organization deploys code to production.

What it actually indicates: Deployment frequency is a proxy for batch size. Teams that deploy frequently ship smaller changes. Smaller changes are easier to review, easier to test, and easier to roll back when something breaks.

DORA Classification	Deployment Frequency
Elite	Multiple deploys per day
High	Between once per day and once per week
Medium	Between once per week and once per month
Low	Less than once per month

The trap: Teams game this by splitting PRs into tiny fragments or deploying empty changes. A team deploying 10 times a day while shipping nothing meaningful isn't "elite"—they're just busy.

2. Lead Time for Changes

What it measures: The time from code commit to production deployment.

What it actually indicates: Lead time reveals the friction in your delivery pipeline. Long lead times point to bottlenecks—slow code review, manual testing gates, complex deployment processes, or approval bureaucracy.

DORA Classification	Lead Time
Elite	Less than one hour
High	Between one day and one week
Medium	Between one week and one month
Low	More than one month

The trap: Teams optimize for lead time by skipping reviews or reducing test coverage. Your lead time dropped from 3 days to 3 hours—but now you're shipping bugs straight to production.

"A 3-day lead time with thorough review beats a 3-hour lead time with no review. The metric doesn't capture what you skipped."

For detailed benchmarks on PR cycle time (a key component of lead time), see our PR Cycle Time Benchmarks by Team Size.

3. Change Failure Rate

What it measures: The percentage of deployments that cause a failure requiring remediation—rollback, hotfix, or patch.

What it actually indicates: Change failure rate is the quality counterbalance to velocity metrics. High deployment frequency only matters if those deployments work. This metric catches teams that game velocity by shipping broken code.

DORA Classification	Change Failure Rate
Elite	0-15%
High	16-30%
Medium	16-30%
Low	More than 30%

The trap: Teams game this by under-reporting failures or defining "failure" narrowly. If hotfixes don't count and rollbacks are "planned," your 5% failure rate is a lie.

4. Mean Time to Recovery (MTTR)

What it measures: How long it takes to restore service when an incident occurs.

What it actually indicates: MTTR measures your incident response capability. Low MTTR requires good monitoring (you detect problems fast), good runbooks (you know what to do), and practiced response (you've done this before).

DORA Classification	MTTR
Elite	Less than one hour
High	Less than one day
Medium	Between one day and one week
Low	More than one week

The trap: Teams game this by not declaring incidents, or by declaring resolution before customers are actually unaffected.

Measuring DORA Metrics from GitHub (Without Full CI/CD Access)

Most teams don't have unified CI/CD pipelines with comprehensive deployment tracking. Here's how to approximate each metric using GitHub data alone—and what caveats apply.

Deployment Frequency: Use Merge Frequency

If you practice trunk-based development and merging to main triggers deployment, merge frequency approximates deployment frequency.

Deployment Frequency Proxies (GitHub-only):

1. MERGES TO MAIN BRANCH
   - Works if: merge → deploy is automated
   - Caveat: Manual gates create gaps

2. RELEASE TAGS
   - Works if: you tag every release
   - Caveat: Often underreported

3. PR MERGE RATE
   - Count: PRs merged per day/week
   - Caveat: Not all PRs trigger deployments

CodePulse approach:
  - Tracks PRs merged (deployment proxy)
  - Filters to specific branches if needed
  - Shows trend over time for trajectory analysis

Lead Time: Use PR Cycle Time

PR cycle time—from first commit to merge—captures most of lead time. Add your typical deploy delay for a more complete picture.

⏱️Measuring Lead Time in CodePulse

CodePulse breaks down your cycle time into its components:

Go to Dashboard to see the Cycle Time Breakdown
Coding Time — First commit to PR open
Pickup Time — PR open to first review
Review Time — First review to approval
Merge Time — Approval to merge
Total cycle time ≈ lead time minus deployment delay

Change Failure Rate: Track Reverts and Hotfixes

Without incident management data, use these proxies:

Revert commit ratio: Commits that revert previous changes indicate production issues.
Hotfix branch frequency: If you use hotfix branches, count them.
PR failure patterns: High code churn correlates with quality issues.

MTTR: Measure Hotfix Cycle Time

From GitHub alone, you can measure the cycle time of hotfix branches or PRs tagged as fixes. The time from hotfix branch creation to merge indicates recovery speed.

Identify bottlenecks slowing your team with CodePulse

The Problem with Industry Benchmarks

"The original DORA research surveyed 30,000 people. Your team has 8. The statistics don't transfer."

The DORA benchmark tables are useful for understanding what's possible—but dangerous when used as targets. Here's why:

Context Matters More Than Classification

Factor	Impact on Reasonable Metrics
Regulated industry	Compliance gates slow deployments legitimately
Legacy monolith	Architecture limits deployment frequency
Hardware dependencies	Can't deploy faster than physical shipping
B2B enterprise	Customers may prefer stability over frequent changes
Small team	Review bottlenecks are structural, not fixable by process

A fintech team with a 2-week deployment cycle isn't "low performing"—they're compliant. A medical device company that deploys quarterly isn't broken—they're following FDA requirements.

What to Do Instead

Establish your own baselines. Measure where you are today. That's your starting point.
Focus on trajectory. Are your metrics improving? That matters more than the absolute number.
Investigate outliers. When a metric spikes or drops, find out why. The story matters more than the number.
Use metrics to find problems, not judge people. Long lead time is a signal to investigate, not a failure to punish.

Improving DORA Metrics (Without Gaming Them)

The right way to improve DORA metrics is to improve the capabilities they reflect—not to optimize for the numbers directly.

To Improve Deployment Frequency

Reduce batch size. Smaller PRs are easier to review, test, and deploy. See our PR Size Optimization Guide.
Automate deployments. Manual deployment processes are the biggest barrier.
Use feature flags. Decouple deployment from release—ship code that's not yet visible to users.
Build confidence. Good monitoring lets you deploy with less fear.

To Reduce Lead Time

Speed up code review. This is usually the biggest bottleneck. See our guide to reducing PR cycle time.
Parallelize CI. Slow test suites add hours to every PR.
Eliminate manual gates. Each approval step adds delay. Question whether each gate adds value.
Practice trunk-based development. Long-lived branches accumulate merge conflicts that slow everything down.

To Reduce Change Failure Rate

Improve test coverage. Automated tests catch issues before production.
Invest in code review. Human review catches what tests miss.
Use progressive rollouts. Canary deployments limit blast radius when something breaks.
Build better staging environments. Production-like environments catch environment-specific issues.

To Reduce MTTR

Invest in observability. You can't fix what you can't see.
Create runbooks. Documented procedures speed recovery.
Practice incident response. Game days build muscle memory.
Enable easy rollbacks. The ability to undo a deployment instantly drops MTTR dramatically.

🔔Setting Up DORA-Aligned Alerts in CodePulse

Use alerts to catch problems before they become crises:

Go to Alert Rules
Set a threshold for cycle time (e.g., alert when average exceeds 72 hours)
Alert on deployment frequency drops (e.g., fewer than 5 merges/week)
Monitor churn rate spikes (a proxy for change failure risk)
See the alerts guide for more patterns

A Sensible Implementation Approach

Here's how to adopt DORA metrics without falling into the gaming trap:

Phase 1: Establish Baselines (Week 1-2)

Identify your data sources. What can you actually measure? Don't let perfect be the enemy of good.
Calculate current values. Look at the past 90 days. Get a feel for your current state.
Document your context. What constraints affect your metrics? Regulatory requirements? Architecture decisions? Team size?
Share findings without judgment. These are diagnostic numbers, not performance grades.

Phase 2: Investigate, Don't Target (Week 3-4)

Find the bottlenecks. If lead time is long, which component is slowest? Review? Testing? Deployment?
Understand the root causes. Is slow review caused by PR size? Reviewer availability? Unclear ownership?
Choose one improvement area. Don't try to fix everything at once.
Define a practice change, not a metric target. "We'll reduce PR size to under 400 lines" is better than "We'll reduce lead time to 2 days."

Phase 3: Improve Practices (Ongoing)

Implement the practice change. Focus on the behavior, not the metric.
Monitor metrics as a side effect. The numbers should improve because the practices improved.
If metrics don't improve, question your theory. Maybe smaller PRs weren't the actual bottleneck.
Celebrate practice adoption, not metric gains. "We shipped 80% of PRs under 400 lines this month" is more meaningful than "Our lead time dropped 20%."

The Right Mindset for DORA Metrics

DORA metrics are useful when you treat them as signals—indicators that something might need attention. They're harmful when you treat them as goals—numbers to be achieved.

The DORA research identified what elite teams look like. It didn't provide a recipe for becoming elite. There's no shortcut. You improve by building better practices, not by targeting better numbers.

Use DORA metrics to start conversations: "Why did lead time spike last month?" Use them to detect problems early: "Deployment frequency is trending down—what's blocking us?" Use them to validate improvements: "After we switched to trunk-based development, cycle time dropped 40%."

Don't use them to judge teams. Don't use them to set performance targets. Don't compare yourself to "elite" benchmarks from organizations with completely different contexts.

For more on measuring team health without creating perverse incentives, see our guides on measuring team performance without micromanaging, understanding cycle time breakdown, and deployment frequency and lead time.